Search Results

Search found 504 results on 21 pages for 'failover'.

Page 11/21 | < Previous Page | 7 8 9 10 11 12 13 14 15 16 17 18 | Next Page >

Disaster, or Migration?

- by Rob Farley

This post is in two parts – technical and personal. And I should point out that it’s prompted in part by this month’s T-SQL Tuesday, hosted by Allen Kinsel. First, the technical: I’ve had a few conversations with people recently about migration – moving a SQL Server database from one box to another (sometimes, but not primarily, involving an upgrade). One question that tends to come up is that of downtime. Obviously there will be some period of time between the old server being available and the new one. The way that most people seem to think of migration is this: Build a new server. Stop people from using the old server. Take a backup of the old server Restore it on the new server. Reconfigure the client applications (or alternatively, configure the new server to use the same address as the old) Make the new server online. There are other things involved, such as testing, of course. But this is essentially the process that people tell me they’re planning to follow. The bit that I want to look at today (as you’ve probably guessed from my title) is the “backup and restore” section. If a SQL database is using the Simple Recovery Model, then the only restore option is the last database backup. This backup could be full or differential. The transaction log never gets backed up in the Simple Recovery Model. Instead, it truncates regularly to stay small. One that’s using the Full Recovery Model (or Bulk-Logged) won’t truncate its log – the log must be backed up regularly. This provides the benefit of having a lot more option available for restores. It’s a requirement for most systems of High Availability, because if you’re making sure that a spare box is up-and-running, ready to take over, then you have to be interested in the logs that are happening on the current box, rather than truncating them all the time. A High Availability system such as Mirroring, Replication or Log Shipping will initialise the spare machine by restoring a full database backup (and maybe a differential backup if available), and then any subsequent log backups. Once the secondary copy is close, transactions can be applied to keep the two in sync. The main aspect of any High Availability system is to have a redundant system that is ready to take over. So the similarity for migration should be obvious. If you need to move a database from one box to another, then introducing a High Availability mechanism can help. By turning on the Full Recovery Model and then taking a backup (so that the now-interesting logs have some context), logs start being kept, and are therefore available for getting the new box ready (even if it’s an upgraded version). When the migration is ready to occur, a failover can be done, letting the new server take over the responsibility of the old, just as if a disaster had happened. Except that this is a planned failover, not a disaster at all. There’s a fine line between a disaster and a migration. Failovers can be useful in patching, upgrading, maintenance, and more. Hopefully, even an unexpected disaster can be seen as just another failover, and there can be an opportunity there – perhaps to get some work done on the principal server to increase robustness. And if I’ve just set up a High Availability system for even the simplest of databases, it’s not necessarily a bad thing. :) So now the personal: It’s been an interesting time recently... June has been somewhat odd. A court case with which I was involved got resolved (through mediation). I can’t go into details, but my lawyers tell me that I’m allowed to say how I feel about it. The answer is ‘lousy’. I don’t regret pursuing it as long as I did – but in the end I had to make a decision regarding the commerciality of letting it continue, and I’m going to look forward to the days when the kind of money I spent on my lawyers is small change. Mind you, if I had a similar situation with an employer, I’d do the same again, but that doesn’t really stop me feeling frustrated about it. The following day I had to fly to country Victoria to see my grandmother, who wasn’t expected to last the weekend. She’s still around a week later as I write this, but her 92-year-old body has basically given up on her. She’s been a Christian all her life, and is looking forward to eternity. We’ll all miss her though, and it’s hard to see my family grieving. Then on Tuesday, I was driving back to the airport with my family to come home, when something really bizarre happened. We were travelling down the freeway, just pulled out to go past a truck (farm-truck sized, not a semi-trailer), when a car-sized mass of metal fell off it. It was something like an industrial air-conditioner, but from where I was sitting, it was just a mass of spinning metal, like something out of a movie (one friend described it as “holidays by Michael Bay”). Somehow, and I’m really don’t know how, the part of it nearest us bounced high enough to clear the car, and there wasn’t even a scratch. We pulled over the check, and I was just thanking God that we’d changed lanes when we had, and that we remained unharmed. I had all kinds of thoughts about what could’ve happened if we’d had something that size land on the windscreen... All this has drilled home that while I feel that I haven’t provided as well for the family as I could’ve done (like by pursuing an expensive legal case), I shouldn’t even consider that I have proper control over things. I get to live life, and make decisions based on what I feel is right at the time. But I’m not going to get everything right, and there will be things that feel like disasters, some which could’ve been in my control and some which are very much beyond my control. The case feels like something I could’ve pursued differently, a disaster that could’ve been avoided in some way. Gran dying is lousy of course. An accident on the freeway would have been awful. I need to recognise that the worst disasters are ones that I can’t affect, and that I need to look at things in context – perhaps seeing everything that happens as a migration instead. Life is never the same from one day to the next. Every event has a before and an after – sometimes it’s clearly positive, sometimes it’s not. I remember good events in my life (such as my wedding), and bad (such as the loss of my father when I was ten, or the back injury I had eight years ago). I’m not suggesting that I know how to view everything from the “God works all things for good” perspective, but I am trying to look at last week as a migration of sorts. Those things are behind me now, and the future is in God’s hands. Hopefully I’ve learned things, and will be able to live accordingly. I’ve come through this time now, and even though I’ll miss Gran, I’ll see her again one day, and the future is bright.

Read the article
Implementing Cluster Continuous Replication, Part 2

Cluster continuous replication (CCR) helps to provide a more resilient email system with faster recovery. It was introduced in Microsoft Exchange Server 2007 and uses log shipping and failover. configuring Cluster Continuous Replication on a Windows Server 2008 requires different techniques to Windows Server 2003. Brien Posey explains all.

Read the article
Hadoop, NOSQL, and the Relational Model

- by Phil Factor

(Guest Editorial for the IT Pro/SysAdmin Newsletter)Whereas Relational Databases fit the world of commerce like a glove, it is useless to pretend that they are a perfect fit for all human endeavours. Although, with SQL Server, we’ve made great strides with indexing text, in processing spatial data and processing markup, there is still a problem in dealing efficiently with large volumes of ephemeral semi-structured data. Key-value stores such as Cassandra, Project Voldemort, and Riak are of great value for ephemeral data, and seem of equal value as a data-feed that provides aggregations to an RDBMS. However, the Document databases such as MongoDB and CouchDB are ideal for semi-structured data for which no fixed schema exists; analytics and logging are obvious examples. NoSQL products, such as MongoDB, tackle the semi-structured data problem with panache. MongoDB is designed with a simple document-oriented data model that scales horizontally across multiple servers. It doesn’t impose a schema, and relies on the application to enforce the data structure. This is another take on the old ‘EAV’ problem (where you don’t know in advance all the attributes of a particular entity) It uses a clever replica set design that allows automatic failover, and uses journaling for data durability. It allows indexing and ad-hoc querying. However, for SQL Server users, the obvious choice for handling semi-structured data is Apache Hadoop. There will soon be an ODBC Driver for Apache Hive .and an Add-in for Excel. Additionally, there are now two Hadoop-based connectors for SQL Server; the Apache Hadoop connector for SQL Server 2008 R2, and the SQL Server Parallel Data Warehouse (PDW) connector. We can connect to Hadoop process the semi-structured data and then store it in SQL Server. For one steeped in the culture of Relational SQL Databases, I might be expected to throw up my hands in the air in a gesture of contempt for a technology that was, judging by the overblown journalism on the subject, about to make my own profession as archaic as the Saggar makers bottom knocker (a potter’s assistant who helped the saggar maker to make the bottom of the saggar by placing clay in a metal hoop and bashing it). However, on the contrary, I find that I'm delighted with the advances made by the NoSQL databases in the past few years. Having the flow of ideas from the NoSQL providers will knock any trace of complacency out of the providers of Relational Databases and inspire them into back-fitting some features, such as horizontal scaling, with sharding and automatic failover into SQL-based RDBMSs. It will do the breed a power of good to benefit from all this lateral thinking.

Read the article
SQL Server 2008 Cluster Installation - First network name always fails

- by boflynn

I'm testing failover clustering in Windows Server 2008 to host a SQL Server 2008 installation using this installation guide. My base cluster is installed and working properly, as well as clustering the DTC service. However, when it comes time to install SQL Server, my first attempt at installation always fails with the same message and seems to "taint" the network name. For example, with my previous cluster attempt, I was installing SQL Server as VSQL. After approximately 15 attempts of installation and trying to resolve the errors, e.g. changing domain accounts for SQL, setting SPNs, etc., I typoed the network name as VQSL and the installation worked. Similarly on my current cluster, I tried installing with the SQL service named PROD-C1-DB and got the same errors as last time until I tried changing the name to anything else, e.g. PROD-C1-DB1, SQL, TEST, etc., at which point the install works. It will even install to VSQL now. While testing, my install routine was: Run setup.exe from patched media, selecting appropriate options After the install fails, I'd chose "Remove node from a SQL Server failover cluster" and remove the single, failed, node Attempt to diagnose problem, inspect event logs, etc. Delete the computer account that was created for the SQL Service from Active Directory Delete the MSSQL10.MSSQLSERVER folder from the shared data drive The error message I receive from the SQL Server installer is: The following error has occurred: The cluster resource 'SQL Server' could not be brought online. Error: The group or resource is not in the correct state to perform the requested operation. (Exception from HRESULT: 0x8007139F) Along with hundreds of the following errors in the Application event log: [sqsrvres] checkODBCConnectError: sqlstate = 28000; native error = 4818; message = [Microsoft][SQL Server Native Client 10.0][SQL Server]Login failed for user 'NT AUTHORITY\ANONYMOUS LOGON'. System configuration notes: Windows Server 2008 Enterprise Edition x64 SQL Server 2008 Enterprise Edition x64 using slipstreamed SP1+CU1 media Dell PowerEdge servers Fibre attached storage

Read the article
Managing scalability and availability with two servers running Apache Httpd, Apache Mina and MySQL

- by celalo

Hello, I am not a developer and I don't much experience with scalable server architectures. But I am in need of a highly available and scalable system for one of my projects. There is going to be two servers I am going to use for the time being. Both with 4 core CPUs and 8 GB RAM with RAID structures running CentOS 5.4. I will also have feature called "Failover IP" which enables to direct an IP address to another server within short time. The applications which will be run on the servers: There is going to be a Java application based on Apache Mina server for handling TCP requests from some hundreds of network devices where the devices are going to send request as much as one request per minute. Handling those requests, includes parsing the requests and inserting a few rows to the Database. Parsing requests before inserting data to the DB does take neglectable time. There is going to be MySQL server, as I stated above. Also there is going to be a PHP web application running on Apache Httpd Server which uses the same DB with the Java application. What I wish to have is to make use of those two servers at the most. I was imagining to have the servers identical, sharing the work load. MySQL could be a cluster maybe? And if some application fails or the whole machine goes down, the other will continue serving the requests seamlessly. Reminding that a "Failover IP" feature will be available for me to take advantage of. Also, It should be kept in mind that number of servers could increase in time, to meet the demand. What can you suggest? Which kind of tools I can make use of? Which kind of monitoring software (paid/unpaid) I have? Thanks in advance.

Read the article
Windows 2003 Domain Controller Very Upset about NIC Teaming

- by Kyle Brandt

I set up BACS (Broadcom Teaming) to team two NIC on a Windows 2003 Active Directory Domain Controller. Networking still works okay, I can ping the gateway etc, but both DNS and Active Directory fail to start with various 40xx errors. The team that I created is Smart load Balancing with Failover, with one backup and only one in smart load balancing (So really it is just failover). I have the team the same IP address that the single active NIC had before. Anyone seen this before, or have any ideas what the problem might be? Event Type: Error Event Source: DNS Event Category: None Event ID: 4015 Date: 3/7/2010 Time: 10:33:03 AM User: N/A Computer: ADC Description: The DNS server has encountered a critical error from the Active Directory. Check that the Active Directory is functioning properly. The extended error debug information (which may be empty) is "". The event data contains the error. Event Type: Error Event Source: DNS Event Category: None Event ID: 4004 Date: 3/7/2010 Time: 10:33:03 AM User: N/A Computer: ADC Description: The DNS server was unable to complete directory service enumeration of zone .. This DNS server is configured to use information obtained from Active Directory for this zone and is unable to load the zone without it. Check that the Active Directory is functioning properly and repeat enumeration of the zone. The extended error debug information (which may be empty) is "". The event data contains the error. Event Type: Error Event Source: NTDS Replication Event Category: DS RPC Client Event ID: 2087 Date: 3/7/2010 Time: 10:40:28 AM User: NT AUTHORITY\ANONYMOUS LOGON Computer: ADC Description: Active Directory could not resolve the following DNS host name of the source domain controller to an IP address. This error prevents additions, deletions and changes in Active Directory from replicating between one or more domain controllers in the forest. Security groups, group policy, users and computers and their passwords will be inconsistent between domain controllers until this error is resolved, potentially affecting logon authentication and access to network resources.

Read the article
bonding module parameters are not shown in /sys/module/bonding/parameters/

- by c4f4t0r

I have a server with Suse 11 sp1 kernel 2.6.32.54-0.3-default, with modinfo bonding i see all parameters, but under /sys/module/bonding/parameters/ not modinfo bonding | grep ^parm parm: max_bonds:Max number of bonded devices (int) parm: num_grat_arp:Number of gratuitous ARP packets to send on failover event (int) parm: num_unsol_na:Number of unsolicited IPv6 Neighbor Advertisements packets to send on failover event (int) parm: miimon:Link check interval in milliseconds (int) parm: updelay:Delay before considering link up, in milliseconds (int) parm: downdelay:Delay before considering link down, in milliseconds (int) parm: use_carrier:Use netif_carrier_ok (vs MII ioctls) in miimon; 0 for off, 1 for on (default) (int) parm: mode:Mode of operation : 0 for balance-rr, 1 for active-backup, 2 for balance-xor, 3 for broadcast, 4 for 802.3ad, 5 for balance-tlb, 6 for balance-alb (charp) parm: primary:Primary network device to use (charp) parm: lacp_rate:LACPDU tx rate to request from 802.3ad partner (slow/fast) (charp) parm: ad_select:803.ad aggregation selection logic: stable (0, default), bandwidth (1), count (2) (charp) parm: xmit_hash_policy:XOR hashing method: 0 for layer 2 (default), 1 for layer 3+4 (charp) parm: arp_interval:arp interval in milliseconds (int) parm: arp_ip_target:arp targets in n.n.n.n form (array of charp) parm: arp_validate:validate src/dst of ARP probes: none (default), active, backup or all (charp) parm: fail_over_mac:For active-backup, do not set all slaves to the same MAC. none (default), active or follow (charp) in /sys/module/bonding/parameters ls -l /sys/module/bonding/parameters/ total 0 -rw-r--r-- 1 root root 4096 2013-10-17 11:22 num_grat_arp -rw-r--r-- 1 root root 4096 2013-10-17 11:22 num_unsol_na I found some of this parameters under /sys/class/net/bond0/bonding/, but when i try to change one i got the following error echo layer2+3 > /sys/class/net/bond0/bonding/xmit_hash_policy -bash: echo: write error: Operation not permitted

Read the article
How to configuration keepalived on Amazon EC2?

- by oeegee

I rad some article. Keepalived over GRE tunnel for failover on VPS environment http://blog.killtheradio.net/how-tos/keepalived-haproxy-and-failover-on-the-cloud-or-any-vps-without-multicast/ but, I don't know how to configuration? and How to call this architecture? only I Know that How to config Master/Backup configuration at keepalived. What I want to know that How does work keepalived? I want to design this.... XMPP Server(EC2) | ------------------------------------------------- keepalived Master(EC2) - keepalived Backup(EC2) HAProxy #1 HAProxy#2 ------------------------------------------------- | Casandra#1 Casandra#2 Casandra#3 Casandra#4 Thanks! but What I want to know how to work on keepalived with unicast patche modul. ELB is expansive. and this is first totaly design. [Flow] ELB -- XMPP Server -- ELB -- Casandra ELB | XMPP#1 XMPP#2 XMPP#3 XMPP#4 | ELB | Casandra#1 Casandra#2 Casandra#3 Casandra#4 and change first design. [Flow] ELB -- XMPP Server -- HAProxy Master(Casandra Farm) -- Casandra ELB | XMPP#1 XMPP#2 XMPP#3 XMPP#4 | ------------------------------------------------- keepalived Master(EC2) - keepalived Backup(EC2) HAProxy#1 HAProxy#2 ------------------------------------------------- | Casandra#1 Casandra#2 Casandra#3 Casandra#4 this is second. [Flow] ELB -- HAProxy(XMPP Farm) -- XMPP Server -- HAProxy(Casandra Farm) -- Casanda It's OK? ELB | HAProxy#1 HAProxy#2 HAProxy#3 HAProxy#4 XMPP#1 XMPP#2 XMPP#3 XMPP#4 | Casandra#1 Casandra#2 Casandra#3 Casandra#4

Read the article
Make a snapshot of a live mySQL database with myISAM & innoDB tables without locking

- by Artem

We have a live database in production where we are running out of space on the server. So I would like to transfer to a new server without any downtime (or as little downtime as possible). In general, I would also like to have a hot failover copy of the database available. I would like to use replication to get all of the data copied to the new machine, and then at some point flip a switch and have that new machine become the master (normal failover scenario). My problem is that I am not sure how to initialize replication without locking the db to make the initial snapshot I will use? Is there any way to do this? I know I could do it using single-transaction if I was using innoDB, but very unfortunately we have some myISAM tables in there (in fact the largest 150GB table is myISAM and I want to switch it to InnoDB but I can't do it until I have more space & a hot copy to switch to). Any ideas? Is there some way to make such a snapshot? Or is there alternatively a way to get replication to "catch up" without an snapshot for initialization?

Read the article
Apache 2 Fails to Start After Upgrade with No Errors

- by Mark Davidson

Hi all Hoping someone can help me with a server issue. Recently we upgraded to the latest apache on 2 boxes within are organisation. One being the master box the other being for failover. The upgrade went fine on the master box but on the failover box apache fails to start with no errors, being output or logged. Both boxes have the exact same configuration so found this a bit strange. I've reinstalled apache and have been through checking the configs and did not find any obvious errors. Eventally I ran a syntax check on each config file being included and found that one of the files apparently has syntax errors. Invalid command 'Order', perhaps misspelled or defined by a module not included in the server configuration Invalid command 'php_value', perhaps misspelled or defined by a module not included in the server configuration Invalid command 'GeoIPEnable', perhaps misspelled or defined by a module not included in the server configuration I've trippled checked all the modules are enabled but it still fails. I've googled the subject of these errors loads but have been unable to fine a solution. I was wondering if anyone had encountered such a problem before and could point me towards a solution. Thanks for your help in advance. P.s: Apache related versions on server. ii apache2 2.2.3-4+etch10 Next generation, scalable, extendable web se ii apache2-mpm-prefork 2.2.3-4+etch10 Traditional model for Apache HTTPD 2.1 ii apache2-utils 2.2.3-4+etch10 utility programs for webservers ii apache2.2-common 2.2.3-4+etch10 Next generation, scalable, extendable web se ii libapache2-mod-geoip 1.1.8-2 GeoIP support for apache2 ii libapache2-mod-php5 5.2.0+dfsg-8+etch15 server-side, HTML-embedded scripting languag

Read the article
SQL Server 2008 Cluster Installation - First network name always fails

- by boflynn

I'm testing failover clustering in Windows Server 2008 to host a SQL Server 2008 installation using this installation guide. My base cluster is installed and working properly, as well as clustering the DTC service. However, when it comes time to install SQL Server, my first attempt at installation always fails with the same message and seems to "taint" the network name. For example, with my previous cluster attempt, I was installing SQL Server as VSQL. After approximately 15 attempts of installation and trying to resolve the errors, e.g. changing domain accounts for SQL, setting SPNs, etc., I typoed the network name as VQSL and the installation worked. Similarly on my current cluster, I tried installing with the SQL service named PROD-C1-DB and got the same errors as last time until I tried changing the name to anything else, e.g. PROD-C1-DB1, SQL, TEST, etc., at which point the install works. It will even install to VSQL now. While testing, my install routine was: Run setup.exe from patched media, selecting appropriate options After the install fails, I'd chose "Remove node from a SQL Server failover cluster" and remove the single, failed, node Attempt to diagnose problem, inspect event logs, etc. Delete the computer account that was created for the SQL Service from Active Directory Delete the MSSQL10.MSSQLSERVER folder from the shared data drive The error message I receive from the SQL Server installer is: The following error has occurred: The cluster resource 'SQL Server' could not be brought online. Error: The group or resource is not in the correct state to perform the requested operation. (Exception from HRESULT: 0x8007139F) Along with hundreds of the following errors in the Application event log: [sqsrvres] checkODBCConnectError: sqlstate = 28000; native error = 4818; message = [Microsoft][SQL Server Native Client 10.0][SQL Server]Login failed for user 'NT AUTHORITY\ANONYMOUS LOGON'. System configuration notes: Windows Server 2008 Enterprise Edition x64 SQL Server 2008 Enterprise Edition x64 using slipstreamed SP1+CU1 media Dell PowerEdge servers Fibre attached storage

Read the article
Virtual Fileserver

- by Sergei

Hi, We are planning to move our production servers to the datacenter and virtualize remaining servers in the process.Datacenter will have HP blades with vSphere on top.Currentliy we are using Celerra NS20 as fileserver.Since datacenter is using HP kit and EVA 4400 as SAN, we cannot have Celerra there, as EMC supoprt for Celerra does not work for non EMC array. I have searched for possible options and one of them was to have HP NAS blade X3800sb instead of Celerra.However this seems like overkill for me.We are only using Celerra for about 100 users and 50 servers and I think having X3800sb could be waste of resources. The other option would be to have a virtual fileserver as a part of vmware environment in datacenter.We only need CIFS to be provided.The only option I can think of is Windows Storage server.We had a bad expirience with Windows servers used as fileservers ( memory leaks one thing) in the past and this was one of the reasons we moved to Celerra. What are the other options?We need something as reliable as Celerra with as many options as possible.For example , Celerra has per folder quotas, deduplication, dynamic volume allocation, automatic failover, VTLU, replication. Also we would need to replicate NAS data to the failover site.We could use block level replication , SAN-to-SAN, but this would mean wasted bandwidth, as we need only subset of folders to be replicated.We used CA XSoft for windows servers in the past and Celerra has option for Celerra replication. Thank you very much in advance, Please ask me if I missed any details!

Read the article
Prevent master to fall back to master after failure

- by Chrille

I'm using keepalived to setup a virtual ip that points to a master server. When a failover happens it should point the virtual ip to the backup, and the IP should stay there until I manually enable (fix) the master. The reason this is important is that I'm running mysql replication on the servers and writes should only be on the master. When I failover I promote the slave to master. The master server: global_defs { ! this is who emails will go to on alerts notification_email { [email protected] ! add a few more email addresses here if you would like } notification_email_from [email protected] ! I use the local machine to relay mail smtp_server 127.0.0.1 smtp_connect_timeout 30 ! each load balancer should have a different ID ! this will be used in SMTP alerts, so you should make ! each router easily identifiable lvs_id APP1 } vrrp_instance APP1 { interface eth0 state EQUAL virtual_router_id 61 priority 999 nopreempt virtual_ipaddress { 217.x.x.129 } smtp_alert } Backup server: global_defs { ! this is who emails will go to on alerts notification_email { [email protected] ! add a few more email addresses here if you would like } notification_email_from [email protected] ! I use the local machine to relay mail smtp_server 127.0.0.1 smtp_connect_timeout 30 ! each load balancer should have a different ID ! this will be used in SMTP alerts, so you should make ! each router easily identifiable lvs_id APP2 } vrrp_instance APP2 { interface eth0 state EQUAL virtual_router_id 61 priority 100 virtual_ipaddress { 217.xx.xx.129 } notify_master "/etc/keepalived/notify.sh del app2" notify_backup "/etc/keepalived/notify.sh add app2" notify_fault "/etc/keepalived/notify.sh add app2” smtp_alert }

Read the article
Clustering Basics and Challenges

- by Karoly Vegh

For upcoming posts it seemed to be a good idea to dedicate some time for cluster basic concepts and theory. This post misses a lot of details that would explode the articlesize, should you have questions, do not hesitate to ask them in the comments. The goal here is to get some concepts straight. I can't promise to give you an overall complete definitions of cluster, cluster agent, quorum, voting, fencing, split brain condition, so the following is more of an explanation. Here we go. -------- Cluster, HA, failover, switchover, scalability -------- An attempted definition of a Cluster: A cluster is a set (2+) server nodes dedicated to keep application services alive, communicating through the cluster software/framework with eachother, test and probe health status of servernodes/services and with quorum based decisions and with switchover/failover techniques keep the application services running on them available. That is, should a node that runs a service unexpectedly lose functionality/connection, the other ones would take over the and run the services, so that availability is guaranteed. To provide availability while strictly sticking to a consistent clusterconfiguration is the main goal of a cluster. At this point we have to add that this defines a HA-cluster, a High-Availability cluster, where the clusternodes are planned to run the services in an active-standby, or failover fashion. An example could be a single instance database. Some applications can be run in a distributed or scalable fashion. In the latter case instances of the application run actively on separate clusternodes serving servicerequests simultaneously. An example for this version could be a webserver that forwards connection requests to many backend servers in a round-robin way. Or a database running in active-active RAC setup. -------- Cluster arhitecture, interconnect, topologies -------- Now, what is a cluster made of? Servers, right. These servers (the clusternodes) need to communicate. This of course happens over the network, usually over dedicated network interfaces interconnecting all the clusternodes. These connection are called interconnects.How many clusternodes are in a cluster? There are different cluster topologies. The most simple one is a clustered pair topology, involving only two clusternodes: There are several more topologies, clicking the image above will take you to the relevant documentation. Also, to answer the question Solaris Cluster allows you to run up to 16 servers in a cluster. Where shall these clusternodes be placed? A very important question. The right answer is: It depends on what you plan to achieve with the cluster. Do you plan to avoid only a server outage? Then you can place them right next to eachother in the datacenter. Do you need to avoid DataCenter outage? In that case of course you should place them at least in different fire zones. Or in two geographically distant DataCenters to avoid disasters like floods, large-scale fires or power outages. We call this a stretched- or campus cluster, the clusternodes being several kilometers away from eachother. To cover really large distances, you probably need to move to a GeoCluster, which is a different kind of animal. What is a geocluster? A Geographic Cluster in Solaris Cluster terms is actually a metacluster between two, separate (locally-HA) clusters. -------- Cluster resource types, agents, resources, resource groups -------- So how does the cluster manage my applications? The cluster needs to start, stop and probe your applications. If you application runs, the cluster needs to check regularly if the application state is healthy, does it respond over the network, does it have all the processes running, etc. This is called probing. If the cluster deems the application is in a faulty state, then it can try to restart it locally or decide to switch (stop on node A, start on node B) the service. Starting, stopping and probing are the three actions that a cluster agent does. There are many different kinds of agents included in Solaris Cluster, but you can build your own too. Examples are an agent that manages (mounts, moves) ZFS filesystems, or the Oracle DB HA agent that cares about the database, or an agent that moves a floating IP address between nodes. There are lots of other agents included for Apache, Tomcat, MySQL, Oracle DB, Oracle Weblogic, Zones, LDoms, NFS, DNS, etc.We also need to clarify the difference between a cluster resource and the cluster resource group.A cluster resource is something that is managed by a cluster agent. Cluster resource types are included in Solaris cluster (see above, e.g. HAStoragePlus, HA-Oracle, LogicalHost). You can group cluster resources into cluster resourcegroups, and switch these groups together from one node to another. To stick to the example above, to move an Oracle DB service from one node to another, you have to switch the group between nodes, and the agents of the cluster resources in the group will do the following: On node A Shut down the DB Unconfigure the LogicalHost IP the DB Listener listens on unmount the filesystem Then, on node B: mount the FS configure the IP startup the DB -------- Voting, Quorum, Split Brain Condition, Fencing, Amnesia -------- How do the clusternodes agree upon their action? How do they decide which node runs what services? Another important question. Running a cluster is a strictly democratic thing.Every node has votes, and you need the majority of votes to have the deciding power. Now, this is usually no problem, clusternodes think very much all alike. Still, every action needs to be governed upon in a productive system, and has to be agreed upon. Agreeing is easy as long as the clusternodes all behave and talk to eachother over the interconnect. But if the interconnect is gone/down, this all gets tricky and confusing. Clusternodes think like this: "My job is to run these services. The other node does not answer my interconnect communication, it must be down. I'd better take control and run the services!". The problem is, as I have already mentioned, clusternodes very much think alike. If the interconnect is gone, they all assume the other node is down, and they all want to mount the data backend, enable the IP and run the database. Double IPs, double mounts, double DB instances - now that is trouble. Also, in a 2-node cluster they both have only 50% of the votes, that is, they themselves alone are not allowed to run a cluster. This is where you need a quorum device. According to Wikipedia, the "requirement for a quorum is protection against totally unrepresentative action in the name of the body by an unduly small number of persons.". They need additional votes to run the cluster. For this requirement a 2-node cluster needs a quorum device or a quorum server. If the interconnect is gone, (this is what we call a split brain condition) both nodes start to race and try to reserve the quorum device to themselves. They do this, because the quorum device bears an additional vote, that could ensure majority (50% +1). The one that manages to lock the quorum device (e.g. if it's an FC LUN, it SCSI reserves it) wins the right to build/run a cluster, the other one - realizing he was late - panics/reboots to ensure the cluster config stays consistent. Losing the interconnect isn't only endangering the availability of services, but it also endangers the cluster configuration consistence. Just imagine node A being down and during that the cluster configuration changes. Now node B goes down, and node A comes up. It isn't uptodate about the cluster configuration's changes so it will refuse to start a cluster, since that would lead to cluster amnesia, that is the cluster had some changes, but now runs with an older cluster configuration repository state, that is it's like it forgot about the changes. Also, to ensure application data consistence, the clusternode that wins the race makes sure that a server that isn't part of or can't currently join the cluster can access the devices. This procedure is called fencing. This usually happens to storage LUNs via SCSI reservation. Now, another important question: Where do I place the quorum disk? Imagine having two sites, two separate datacenters, one in the north of the city and the other one in the south part of it. You run a stretched cluster in the clustered pair topology. Where do you place the quorum disk/server? If you put it into the north DC, and that gets hit by a meteor, you lose one clusternode, which isn't a problem, but you also lose your quorum, and the south clusternode can't keep the cluster running lacking the votes. This problem can't be solved with two sites and a campus cluster. You will need a third site to either place the quorum server to, or a third clusternode. Otherwise, lacking majority, if you lose the site that had your quorum, you lose the cluster. Okay, we covered the very basics. We haven't talked about virtualization support, CCR, ClusterFilesystems, DID devices, affinities, storage-replication, management tools, upgrade procedures - should those be interesting for you, let me know in the comments, along with any other questions. Given enough demand I'd be glad to write a followup post too. Now I really want to move on to the second part in the series: ClusterInstallation. Oh, as for additional source of information, I recommend the documentation: http://docs.oracle.com/cd/E23623_01/index.html, and the OTN Oracle Solaris Cluster site: http://www.oracle.com/technetwork/server-storage/solaris-cluster/index.html

Read the article
How to Avoid Your Next 12-Month Science Project

- by constant

While most customers immediately understand how the magic of Oracle's Hybrid Columnar Compression, intelligent storage servers and flash memory make Exadata uniquely powerful against home-grown database systems, some people think that Exalogic is nothing more than a bunch of x86 servers, a storage appliance and an InfiniBand (IB) network, built into a single rack. After all, isn't this exactly what the High Performance Computing (HPC) world has been doing for decades? On the surface, this may be true. And some people tried exactly that: They tried to put together their own version of Exalogic, but then they discover there's a lot more to building a system than buying hardware and assembling it together. IT is not Ikea. Why is that so? Could it be there's more going on behind the scenes than merely putting together a bunch of servers, a storage array and an InfiniBand network into a rack? Let's explore some of the special sauce that makes Exalogic unique and un-copyable, so you can save yourself from your next 6- to 12-month science project that distracts you from doing real work that adds value to your company. Engineering Systems is Hard Work! The backbone of Exalogic is its InfiniBand network: 4 times better bandwidth than even 10 Gigabit Ethernet, and only about a tenth of its latency. What a potential for increased scalability and throughput across the middleware and database layers! But InfiniBand is a beast that needs to be tamed: It is true that Exalogic uses a standard, open-source Open Fabrics Enterprise Distribution (OFED) InfiniBand driver stack. Unfortunately, this software has been developed by the HPC community with fastest speed in mind (which is good) but, despite the name, not many other enterprise-class requirements are included (which is less good). Here are some of the improvements that Oracle's InfiniBand development team had to add to the OFED stack to make it enterprise-ready, simply because typical HPC users didn't have the need to implement them: More than 100 bug fixes in the pieces that were not related to the Message Passing Interface Protocol (MPI), which is the protocol that HPC users use most of the time, but which is less useful in the enterprise. Performance optimizations and tuning across the whole IB stack: From Switches, Host Channel Adapters (HCAs) and drivers to low-level protocols, middleware and applications. Yes, even the standard HPC IB stack could be improved in terms of performance. Ethernet over IB (EoIB): Exalogic uses InfiniBand internally to reach high performance, but it needs to play nicely with datacenters around it. That's why Oracle added Ethernet over InfiniBand technology to it that allows for creating many virtual 10GBE adapters inside Exalogic's nodes that are aggregated and connected to Exalogic's IB gateway switches. While this is an open standard, it's up to the vendor to implement it. In this case, Oracle integrated the EoIB stack with Oracle's own IB to 10GBE gateway switches, and made it fully virtualized from the beginning. This means that Exalogic customers can completely rewire their server infrastructure inside the rack without having to physically pull or plug a single cable - a must-have for every cloud deployment. Anybody who wants to match this level of integration would need to add an InfiniBand switch development team to their project. Or just buy Oracle's gateway switches, which are conveniently shipped with a whole server infrastructure attached! IPv6 support for InfiniBand's Sockets Direct Protocol (SDP), Reliable Datagram Sockets (RDS), TCP/IP over IB (IPoIB) and EoIB protocols. Because no IPv6 = not very enterprise-class. HA capability for SDP. High Availability is not a big requirement for HPC, but for enterprise-class application servers it is. Every node in Exalogic's InfiniBand network is connected twice for redundancy. If any cable or port or HCA fails, there's always a replacement link ready to take over. This requires extra magic at the protocol level to work. So in addition to Weblogic's failover capabilities, Oracle implemented IB automatic path migration at the SDP level to avoid unnecessary failover operations at the middleware level. Security, for example spoof-protection. Another feature that is less important for traditional users of InfiniBand, but very important for enterprise customers. InfiniBand Partitioning and Quality-of-Service (QoS): One of the first questions we get from customers about Exalogic is: “How can we implement multi-tenancy?” The answer is to partition your IB network, which effectively creates many networks that work independently and that are protected at the lowest networking layer possible. In addition to that, QoS allows administrators to prioritize traffic flow in multi-tenancy environments so they can keep their service levels where it matters most. Resilient IB Fabric Management: InfiniBand is a self-managing network, so a lot of the magic lies in coming up with the right topology and in teaching the subnet manager how to properly discover and manage the network. Oracle's Infiniband switches come with pre-integrated, highly available fabric management with seamless integration into Oracle Enterprise Manager Ops Center. In short: Oracle elevated the OFED InfiniBand stack into an enterprise-class networking infrastructure. Many years and multiple teams of manpower went into the above improvements - this is something you can only get from Oracle, because no other InfiniBand vendor can give you these features across the whole stack! Exabus: Because it's not About the Size of Your Network, it's How You Use it! So let's assume that you somehow were able to get your hands on an enterprise-class IB driver stack. Or maybe you don't care and are just happy with the standard OFED one? Anyway, the next step is to actually leverage that InfiniBand performance. Here are the choices: Use traditional TCP/IP on top of the InfiniBand stack, Develop your own integration between your middleware and the lower-level (but faster) InfiniBand protocols. While more bandwidth is always a good thing, it's actually the low latency that enables superior performance for your applications when running on any networking infrastructure: The lower the latency, the faster the response travels through the network and the more transactions you can close per second. The reason why InfiniBand is such a low latency technology is that it gets rid of most if not all of your traditional networking protocol stack: Data is literally beamed from one region of RAM in one server into another region of RAM in another server with no kernel/drivers/UDP/TCP or other networking stack overhead involved! Which makes option 1 a no-go: Adding TCP/IP on top of InfiniBand is like adding training wheels to your racing bike. It may be ok in the beginning and for development, but it's not quite the performance IB was meant to deliver. Which only leaves option 2: Integrating your middleware with fast, low-level InfiniBand protocols. And this is what Exalogic's "Exabus" technology is all about. Here are a few Exabus features that help applications leverage the performance of InfiniBand in Exalogic: RDMA and SDP integration at the JDBC driver level (SDP), for Oracle Weblogic (SDP), Oracle Coherence (RDMA), Oracle Tuxedo (RDMA) and the new Oracle Traffic Director (RDMA) on Exalogic. Using these protocols, middleware can communicate a lot faster with each other and the Oracle database than by using standard networking protocols, Seamless Integration of Ethernet over InfiniBand from Exalogic's Gateway switches into the OS, Oracle Weblogic optimizations for handling massive amounts of parallel transactions. Because if you have an 8-lane Autobahn, you also need to improve your ramps so you can feed it with many cars in parallel. Integration of Weblogic with Oracle Exadata for faster performance, optimized session management and failover. As you see, “Exabus” is Oracle's word for describing all the InfiniBand enhancements Oracle put into Exalogic: OFED stack enhancements, protocols for faster IB access, and InfiniBand support and optimizations at the virtualization and middleware level. All working together to deliver the full potential of InfiniBand performance. Who else has 100% control over their middleware so they can develop their own low-level protocol integration with InfiniBand? Even if you take an open source approach, you're looking at years of development work to create, test and support a whole new networking technology in your middleware! The Extras: Less Hassle, More Productivity, Faster Time to Market And then there are the other advantages of Engineered Systems that are true for Exalogic the same as they are for every other Engineered System: One simple purchasing process: No headaches due to endless RFPs and no “Will X work with Y?” uncertainties. Everything has been engineered together: All kinds of bugs and problems have been already fixed at the design level that would have only manifested themselves after you have built the system from scratch. Everything is built, tested and integrated at the factory level . Less integration pain for you, faster time to market. Every Exalogic machine world-wide is identical to Oracle's own machines in the lab: Instant replication of any problems you may encounter, faster time to resolution. Simplified patching, management and operations. One throat to choke: Imagine finger-pointing hell for systems that have been put together using several different vendors. Oracle's Engineered Systems have a single phone number that customers can call to get their problems solved. For more business-centric values, read The Business Value of Engineered Systems. Conclusion: Buy Exalogic, or get ready for a 6-12 Month Science Project And here's the reason why it's not easy to "build your own Exalogic": There's a lot of work required to make such a system fly. In fact, anybody who is starting to "just put together a bunch of servers and an InfiniBand network" is really looking at a 6-12 month science project. And the outcome is likely to not be very enterprise-class. And it won't have Exalogic's performance either. Because building an Engineered System is literally rocket science: It takes a lot of time, effort, resources and many iterations of design/test/analyze/fix to build such a system. That's why InfiniBand has been reserved for HPC scientists for such a long time. And only Oracle can bring the power of InfiniBand in an enterprise-class, ready-to use, pre-integrated version to customers, without the develop/integrate/support pain. For more details, check the new Exalogic overview white paper which was updated only recently. P.S.: Thanks to my colleagues Ola, Paul, Don and Andy for helping me put together this article! var flattr_uid = '26528'; var flattr_tle = 'How to Avoid Your Next 12-Month Science Project'; var flattr_dsc = 'While most customers immediately understand how the magic of Oracle's Hybrid Columnar Compression, intelligent storage servers and flash memory make Exadata uniquely powerful against home-grown database systems, some people think that Exalogic is nothing more than a bunch of x86 servers, a storage appliance and an InfiniBand (IB) network, built into a single rack.After all, isn't this exactly what the High Performance Computing (HPC) world has been doing for decades?On the surface, this may be true. And some people tried exactly that: They tried to put together their own version of Exalogic, but then they discover there's a lot more to building a system than buying hardware and assembling it together. IT is not Ikea.Why is that so? Could it be there's more going on behind the scenes than merely putting together a bunch of servers, a storage array and an InfiniBand network into a rack? Let's explore some of the special sauce that makes Exalogic unique and un-copyable, so you can save yourself from your next 6- to 12-month science project that distracts you from doing real work that adds value to your company.'; var flattr_tag = 'Engineered Systems,Engineered Systems,Infiniband,Integration,latency,Oracle,performance'; var flattr_cat = 'text'; var flattr_url = 'http://constantin.glez.de/blog/2012/04/how-avoid-your-next-12-month-science-project'; var flattr_lng = 'en_GB'

Read the article
SQL 2012 Licensing Thoughts

- by Geoff N. Hiten

The only thing more controversial than new Federal Tax plans is new Licensing plans from Microsoft. In both cases, everyone calculates several numbers. First, will I pay more or less under this plan? Second, will my competition pay more or less than now? Third, will <insert interesting person/company here> pay more or less? Not that items 2 and 3 are meaningful, that is just how people think. Much like tax plans, the devil is in the details, so lets see how this looks. Microsoft shows it here: http://www.microsoft.com/sqlserver/en/us/future-editions/sql2012-licensing.aspx First up is a switch from per-socket to per-core licensing. Anyone who didn’t see something like this coming should rapidly search for a new line of work because you are not paying attention. The explosion of multi-core processors has made SQL Server a bargain. Microsoft is in business to make money and the old per-socket model was not going to do that going forward. Per-core licensing also simplifies virtualization licensing. Physical Core = Virtual Core, at least for licensing. Oversubscribe your processors, that’s your lookout. You still pay for what is exposed to the VM. The cool part is you can seamlessly move physical and virtual workloads around and the licenses follow. The catch is you have to have Software Assurance to make the licenses mobile. Nice touch there. Let’s have a moment of silence for the late, unlamented, largely ignored Workgroup Edition. To quote the Microsoft FAQ: “Standard becomes our sole edition for basic database needs”. Considering I haven’t encountered a singe instance of SQL Server Workgroup Edition in the wild, I don’t think this will be all that controversial. As for pricing, it looks like a wash with current per-socket pricing based on four core sockets. Interestingly, that is the minimum core count Microsoft proposes to swap to transition per-socket to per-core if you are on Software Assurance. Reading the fine print shows that if you are using more, you will get more core licenses: From the licensing FAQ. 15. How do I migrate from processor licenses to core licenses? What is the migration path? Licenses purchased with Software Assurance (SA) will upgrade to SQL Server 2012 at no additional cost. EA/EAP customers can continue buying processor licenses until your next renewal after June 30, 2012. At that time, processor licenses will be exchanged for core-based licenses sufficient to cover the cores in use by processor-licensed databases (minimum of 4 cores per processor for Standard and Enterprise, and minimum of 8 EE cores per processor for Datacenter). Looks like the folks who invested in the AMD 12-core chips will make out like bandits. Now, on to something new: SQL Server Business Intelligence Edition. Yep, finally a BI-specific SKU licensed for server+CAL configurations only. Note that Enterprise Edition still supports the complete feature set; the BI Edition is intended for smaller shops who want to use the full BI feature set but without needing Enterprise Edition scale (or costs). No, you don’t get ColumnStore, Compression, or Partitioning in the BI Edition. Those are Enterprise scale features, ThankYouVeryMuch. Then again, your starting licensing costs are about one sixth of an Enterprise Edition system (based on an 8 core server). The only part of the message I am missing is if the current Failover Licensing Policy will change. Do we need to fully or partially license failover servers? That is a detail I definitely want to know.

Read the article
Hyper-V Live Migration across Sites!

- by Ryan Roussel

One of the great sessions I sat in on at Tech Ed this week was stretching a Windows 2008 R2 Hyper-V Failover Cluster across sites. With this ability, you could actually implement a Hyper-V cluster where you could migrate or even Live Migrate VMs across sites. With this area’s propensity for Hurricanes, this will be a very popular topic for me over the next few months. While this technology is possible today, it’s also very complicated and can be very expensive to implement. First your WAN connection has to support the ability to trunk your VLAN across both sites in order to Live Migrate. This means you can’t use a Layer 3 routed connection like MPLS. It has to be a Metro Ethernet connection or "Dark Fiber”. Dark Fiber is unused Fiber already in the ground that can be leased from various providers. Both of these connections would allow you to trunk layer 2 across your WAN. Cisco does have the ability to trunk layer 2 across a routed connection by muxing the traffic but this is only available in their Nexus product line which has a very steep price tag. If you are stuck with MPLS or the like and Nexus switching is not a realistic possibility, you will have to implement a multi-subnet cluster in which case Live Migration won’t be possible. However you can still failover VMs to the remote site with some planning and manual intervention. The consideration here is that the VMs will be on a different subnet once migrated, so you will have to change the IP addressing of your VMs. This also has ramifications with DNS and Name resolution to control your down time. DHCP with Reservations for your VMs is the preferred method to achieve the IP changes as this will automate that part of the process. Secondly, you will have to have a mechanism to replicate your storage across both sites. Many SAN vendors natively support hardware based synchronous and asynchronous replication. Some even support cluster shared volumes which were introduced in 2008 R2. If your SANs do not support this natively, there are alternative file based replication products either software based like Double Take or hardware appliance like EMC. Be sure to check with your vendor on the support of Disk majority if you’re replicating your quorum disk between SANs. The last consideration is the ability to maintain quorum for your cluster. If your replication provider does not support Disk Majority through replication, you will have to explore Node Majority with File Share Witness. This will affect your design as a 3 node cluster with 1 node at the remote site and FSW at the production site would not have the ability to maintain quorum if the production site was lost. MS best practice for this would be to implement an even node cluster with 2 nodes at each site and the FSW at a third site. And there you have it. While some considerations and research goes into implementing this solution, even a multi-subnet solution would be invaluable to organizations in the implementations of “warm” DR sites.

Read the article
UAT Testing for SOA 10G Clusters

- by [email protected]

A lot of customers ask how to verify their SOA clusters and make them production ready. Here is a list that I recommend using for 10G SOA Clusters. v\:* {behavior:url(#default#VML);} o\:* {behavior:url(#default#VML);} w\:* {behavior:url(#default#VML);} .shape {behavior:url(#default#VML);} Normal 0 false false false EN-CA X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin-top:0cm; mso-para-margin-right:0cm; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0cm; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; mso-bidi-font-size:12.0pt; font-family:"Calibri","sans-serif"; mso-fareast-language:EN-US;} Test cases for each component - Oracle Application Server 10G General Application Server test cases This section is going to cover very General test cases to make sure that the Application Server cluster has been set up correctly and if you can start and stop all the components in the server via opmnct and AS Console. Test Case 1 Check if you can see AS instances in the console Implementation 1. Log on to the AS Console --> check to see if you can see all the nodes in your AS cluster. You should be able to see all the Oracle AS instances that are part of the cluster. This means that the OPMN clustering worked and the AS instances successfully joined the AS cluster. Result You should be able to see if all the instances in the AS cluster are listed in the EM console. If the instances are not listed here are the files to check to see if OPMN joined the cluster properly: $ORACLE_HOME\opmn\logs{*}opmn.log*$ORACLE_HOME\opmn\logs{*}opmn.dbg* If OPMN did not join the cluster properly, please check the opmn.xml file to make sure the discovery multicast address and port are correct (see this link for opmn documentation). Restart the whole instance using opmnctl stopall followed by opmnctl startall. Log on to AS console to see if instance is listed as part of the cluster. Test Case 2 Check to see if you can start/stop each component Implementation Check each OC4J component on each AS instanceStart each and every component through the AS console to see if they will start and stop.Do that for each and every instance. Result Each component should start and stop through the AS console. You can also verify if the component started by checking opmnctl status by logging onto each box associated with the cluster Test Case 3 Add/modify a datasource entry through AS console on a remote AS instance (not on the instance where EM is physically running) Implementation Pick an OC4J instanceCreate a new data-source through the AS consoleModify an existing data-source or connection pool (optional) Result Open $ORACLE_HOME\j2ee\<oc4j_name>\config\data-sources.xml to see if the new (and or the modified) connection details and data-source exist. If they do then the AS console has successfully updated a remote file and MBeans are communicating correctly. Test Case 4 Start and stop AS instances using opmnctl @cluster command Implementation 1. Go to $ORACLE_HOME\opmn\bin and use the opmnctl @cluster to start and stop the AS instances Result Use opmnctl @cluster status to check for start and stop statuses. HTTP server test cases This section will deal with use cases to test HTTP server failover scenarios. In these examples the HTTP server will be talking to the BPEL console (or any other web application that the client wants), so the URL will be _http://hostname:port\BPELConsole Test Case 1 Shut down one of the HTTP servers while accessing the BPEL console and see the requested routed to the second HTTP server in the cluster Implementation Access the BPELConsoleCheck $ORACLE_HOME\Apache\Apache\logs\access_log --> check for the timestamp and the URL that was accessed by the user. Timestamp and URL would look like this 1xx.2x.2xx.xxx [24/Mar/2009:16:04:38 -0500] "GET /BPELConsole=System HTTP/1.1" 200 15 After you have figured out which HTTP server this is running on, shut down this HTTP server by using opmnctl stopproc --> this is a graceful shutdown.Access the BPELConsole again (please note that you should have a LoadBalancer in front of the HTTP server and configured the Apache Virtual Host, see EDG for steps)Check $ORACLE_HOME\Apache\Apache\logs\access_log --> check for the timestamp and the URL that was accessed by the user. Timestamp and URL would look like above Result Even though you are shutting down the HTTP server the request is routed to the surviving HTTP server, which is then able to route the request to the BPEL Console and you are able to access the console. By checking the access log file you can confirm that the request is being picked up by the surviving node. Test Case 2 Repeat the same test as above but instead of calling opmnctl stopproc, pull the network cord of one of the HTTP servers, so that the LBR routes the request to the surviving HTTP node --> this is simulating a network failure. Test Case 3 In test case 1 we have simulated a graceful shutdown, in this case we will simulate an Apache crash Implementation Use opmnctl status -l to get the PID of the HTTP server that you would like forcefully bring downOn Linux use kill -9 <PID> to kill the HTTP serverAccess the BPEL console Result As you shut down the HTTP server, OPMN will restart the HTTP server. The restart may be so quick that the LBR may still route the request to the same server. One way to check if the HTTP server restared is to check the new PID and the timestamp in the access log for the BPEL console. BPEL test cases This section is going to cover scenarios dealing with BPEL clustering using jGroups, BPEL deployment and testing related to BPEL failover. Test Case 1 Verify that jGroups has initialized correctly. There is no real testing in this use case just a visual verification by looking at log files that jGroups has initialized correctly. Check the opmn log for the BPEL container for all nodes at $ORACLE_HOME/opmn/logs/<group name><container name><group name>~1.log. This logfile will contain jGroups related information during startup and steady-state operation. Soon after startup you should find log entries for UDP or TCP.Example jGroups Log Entries for UDPApr 3, 2008 6:30:37 PM org.collaxa.thirdparty.jgroups.protocols.UDP createSockets · INFO: sockets will use interface 144.25.142.172· · Apr 3, 2008 6:30:37 PM org.collaxa.thirdparty.jgroups.protocols.UDP createSockets· · INFO: socket information:· · local_addr=144.25.142.172:1127, mcast_addr=228.8.15.75:45788, bind_addr=/144.25.142.172, ttl=32· sock: bound to 144.25.142.172:1127, receive buffer size=64000, send buffer size=32000· mcast_recv_sock: bound to 144.25.142.172:45788, send buffer size=32000, receive buffer size=64000· mcast_send_sock: bound to 144.25.142.172:1128, send buffer size=32000, receive buffer size=64000· Apr 3, 2008 6:30:37 PM org.collaxa.thirdparty.jgroups.protocols.TP$DiagnosticsHandler bindToInterfaces· · -------------------------------------------------------· · GMS: address is 144.25.142.172:1127· ------------------------------------------------------- Example jGroups Log Entries for TCPApr 3, 2008 6:23:39 PM org.collaxa.thirdparty.jgroups.blocks.ConnectionTable start · INFO: server socket created on 144.25.142.172:7900· · Apr 3, 2008 6:23:39 PM org.collaxa.thirdparty.jgroups.protocols.TP$DiagnosticsHandler bindToInterfaces· · -------------------------------------------------------· GMS: address is 144.25.142.172:7900------------------------------------------------------- In the log below the "socket created on" indicates that the TCP socket is established on the own node at that IP address and port the "created socket to" shows that the second node has connected to the first node, matching the logfile above with the IP address and port.Apr 3, 2008 6:25:40 PM org.collaxa.thirdparty.jgroups.blocks.ConnectionTable start · INFO: server socket created on 144.25.142.173:7901· · Apr 3, 2008 6:25:40 PM org.collaxa.thirdparty.jgroups.protocols.TP$DiagnosticsHandler bindToInterfaces· · ------------------------------------------------------· GMS: address is 144.25.142.173:7901· -------------------------------------------------------· Apr 3, 2008 6:25:41 PM org.collaxa.thirdparty.jgroups.blocks.ConnectionTable getConnectionINFO: created socket to 144.25.142.172:7900 Result By reviewing the log files, you can confirm if BPEL clustering at the jGroups level is working and that the jGroup channel is communicating. Test Case 2 Test connectivity between BPEL Nodes Implementation Test connections between different cluster nodes using ping, telnet, and traceroute. The presence of firewalls and number of hops between cluster nodes can affect performance as they have a tendency to take down connections after some time or simply block them.Also reference Metalink Note 413783.1: "How to Test Whether Multicast is Enabled on the Network." Result Using the above tools you can confirm if Multicast is working and whether BPEL nodes are commnunicating. Test Case3 Test deployment of BPEL suitcase to one BPEL node. Implementation Deploy a HelloWorrld BPEL suitcase (or any other client specific BPEL suitcase) to only one BPEL instance using ant, or JDeveloper or via the BPEL consoleLog on to the second BPEL console to check if the BPEL suitcase has been deployed Result If jGroups has been configured and communicating correctly, BPEL clustering will allow you to deploy a suitcase to a single node, and jGroups will notify the second instance of the deployment. The second BPEL instance will go to the DB and pick up the new deployment after receiving notification. The result is that the new deployment will be "deployed" to each node, by only deploying to a single BPEL instance in the BPEL cluster. Test Case 4 Test to see if the BPEL server failsover and if all asynch processes are picked up by the secondary BPEL instance Implementation Deploy a 2 Asynch process: A ParentAsynch Process which calls a ChildAsynchProcess with a variable telling it how many times to loop or how many seconds to sleepA ChildAsynchProcess that loops or sleeps or has an onAlarmMake sure that the processes are deployed to both serversShut down one BPEL serverOn the active BPEL server call ParentAsynch a few times (use the load generation page)When you have enough ParentAsynch instances shut down this BPEL instance and start the other one. Please wait till this BPEL instance shuts down fully before starting up the second one.Log on to the BPEL console and see that the instance were picked up by the second BPEL node and completed Result The BPEL instance will failover to the secondary node and complete the flow ESB test cases This section covers the use cases involved with testing an ESB cluster. For this section please Normal 0 false false false EN-CA X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin-top:0cm; mso-para-margin-right:0cm; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0cm; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; mso-bidi-font-size:12.0pt; font-family:"Calibri","sans-serif"; mso-fareast-language:EN-US;} follow Metalink Note 470267.1 which covers the basic tests to verify your ESB cluster.

Read the article
What is the differences between a Remote File Share Witness Quorum and a Disk Witness Quorum?

- by arex1337

Say you set up a Windows Server Failover Cluster consisting of two nodes, on Windows Server 2008 R2 Enterprise. Now when you want to configure quorum you can use either a remote shared folder, or a shared disk (unless your nodes are separated geographically). I'd love to know more about when to use a disk witness and when to use a file share witness. What are the differences? Does it matter at all?

Read the article
DBRD dual primary Heartbeat resource management

- by Jon Skarpeteig

I have the following setup: Two servers with DRBD running dual primary with OCFS2 Heartbeat with two virtual ips, one for each server Round robin DNS to load balance NFS across the two vIPs Shutting down Server1 for a period of time, cause Server2 to take over the vIP for failover. However, when Server1 returns - it takes over the designated vIP as soon as heartbeat gets connection again - even though the DRBD is running sync (and thus not up to date) How can I configure heartbeat to perform failback as soon as Server1 again is in sync with Server2 ? (And not before)

Read the article
mysql with DRBD on rackspace

- by Richard Castle

I am trying to set up a failover secondary MySQL server that is a mirror of my primary MySQL server using DRBD. The problem is that I am on a rackspace cloud server and I need a second partition on both the primary and secondary servers that I will replicate with DRBD. Rackspace does not allow me to create a second partition. I am left with the default single partition. How can I mirror using DRBD?

Read the article
Link aggregation with freebsd8 and a cicso 3550, what am i doing wrong?

- by Flamewires

Hey, I am trying to setup Link Aggrigation with LACP (well, anything that provides increased bandwidth and failover using my setup will work). I'm running FreeBSD 8.0 on 3 machines. M1 is running 2 10/100 ethernetcards setup for link aggrigation using lagg. for reference: ifconfig em0 up ifconfig tx0 up ifconfig create lagg0 ifconfig lagg0 laggproto lacp laggport tx0 laggport em0 192.168.1.16 netmask 255.255.255.0 I plugged them into ports 1 and 2 of a Cicso 3550. then ran: configure terminal interface range Fa0/1 - 2 switchport mode access switchport access vlan 1 channel-group 1 mode active (everythings in vlan 1) Now Im able to connect the other computers to other ports on the switch and failover works great, i can unplug cables in the middle of a transfer and the traffic gets rerouted. However, im not noticing any speed increase. My test setup: load balancing: i tried dst and src on the switch, neither seemed to give me a speed increase. I am SCPing 2 500 meg files from the lagg computer to other computers (one each) which are also running 10/100 full duplex cards. I get transfer speeds of about 11.2-11.4 Mbps to a single host, and about half that (5.9-6.2) Mbps when transferring to both at the same time. From what I understood with destination load balancing the router was suppose to balance traffic headed for 1 computer over 1 port and traffic headed for another over a diff(in this case) the other port. With destination-MAC address forwarding, when packets are forwarded to an EtherChannel, the packets are distributed across the ports in the channel based on the destination host MAC address of the incoming packet. Therefore, packets to the same destination are forwarded over the same port, and packets to a different destination are sent on a different port in the channel. For the 3550 series switch, when source-MAC address forwarding is used, load distribution based on the source and destination IP address is also enabled for routed IP traffic. All routed IP traffic chooses a port based on the source and destination IP address. Packets between two IP hosts always use the same port in the channel, and traffic between any other pair of hosts can use a different port in the channel. (Link) What am i doing wrong/what would i need to do to see a speed increase beyond what i could do with just a single card?

Read the article
lacp, cicso 3550, 3560, help with configuration

- by Flamewires

Hey all this is a repost from a question I asked on the cisco forums but never got a useful reply. Hey I'm trying to convert the FreeBSD servers at work to dual-gig lagg links from regular gigabit links. Our production servers are on a 3560. I have a small test environment on a 3550. I have achieved fail-over, but am having troubles achieving the speed increase. All servers are running gig intel (em) cards. The configs for the servers are: BSDServer: #!/bin/sh #bring up both interfaces ifconfig em0 up media 1000baseTX mediaopt full-duplex ifconfig em1 up media 1000baseTX mediaopt full-duplex #create the lagg interface ifconfig lagg0 create #set lagg0's protocol to lacp, add both cards to the interface, #and assign it em1's ip/netmask ifconfig lagg0 laggproto lacp laggport em0 laggport em1 ***.***.***.*** netmask 255.255.255.0 The switches are configured as follows: #clear out old junk no int Po1 default int range GigabitEthernet 0/15 - 16 # config ports interface range GigabitEthernet 0/15 - 16 description lagg-test switchport duplex full speed 1000 switchport access vlan 192 spanning-tree portfast channel-group 1 mode active channel-protocol lacp **** switchport trunk encapsulation dot1q **** no shutdown exit interface Port-channel 1 description lagginterface switchport access vlan 192 exit port-channel load-balance src-mac end obviously change 1000's to 100's and GigabitEthernet to FastEthernet for the 3550's config, as that switch has 100Mbit speed ports. With this config on the 3550, I get failover and 92Mbits/sec speed on both links, simultaneously, connecting to 2 hosts.(tested with iperf) Success. However this is only with the "switchport trunk encapsulation dot1q" line. First, I do not understand why I need this, I thought it was only for connecting switches. Is there some other setting which this turns on that is actually responsible for the speed increase? Second, This config does not work on the 3560. I get failover, but not the speed increase. Speeds drop from gig/sec to 500Mbit/sec when I make 2 simultaneous connections to the server with or without the encapsulation line. I should mention that both switches are using source-mac load balancing. In my test I am using Iperf. I have the server(lagg box) setup as the server(iperf -s), and the client computers are client(iperf -c server-ip-address), so the source mac(and IP) are different for both connections. Any ideas/corrections/questions would be helpful, as the gig switches are what I actually need the lagg links on. Ask if you need more information.

Read the article
Which network management system (NMS) to choose?

- by QrystaL

I need to integrate NMS in large enterprise system for data collection purposes. Primary requirements: collection by SNMP great scalability (up to 1,000 devices with 1,000 interfaces each) failover data storage in Oracle DBMS integration API (configuration, data access) Any ideas would be appreciated...

Read the article
trigger a DNS change in Active directory

- by Paddy Carroll

Can I get solar winds to change a DNS alias in Active directory based upon a specific set of events or conditions? I have a collection of applications that use hostnames in combination with database names in order to resolve database connections, problem is that they haven't considered how a failover would work in practice so I want the product to provoke a change in DNS to point the apps at the right place if we get into a failure situation. Can it be done?

Read the article

Search Results

Search found 504 results on 21 pages for 'failover'.

Page 11/21 | < Previous Page | 7 8 9 10 11 12 13 14 15 16 17 18 | Next Page >

- by Rob Farley

- by Phil Factor

- by boflynn

- by celalo

- by Kyle Brandt

- by c4f4t0r

- by oeegee

- by Artem

- by Mark Davidson

- by boflynn

- by Sergei

- by Chrille

- by Karoly Vegh

- by constant

- by Geoff N. Hiten

- by Ryan Roussel

- by [email protected]

- by arex1337

- by Jon Skarpeteig

- by Richard Castle

- by Flamewires

- by Flamewires

- by QrystaL

- by Paddy Carroll

< Previous Page | 7 8 9 10 11 12 13 14 15 16 17 18 | Next Page >