Search Results

Search found 504 results on 21 pages for 'failover'.

Page 9/21 | < Previous Page | 5 6 7 8 9 10 11 12 13 14 15 16  | Next Page >

  • SQL Server High Availability - Mirroring with MSCS?

    - by David
    I'm looking at options for high-availability for my SQL Server-powered application. The requirements are: HA protection from storage failure. Data accessibility when one of the DB servers is undergoing software updates (e.g. planned outage for Windows Update / SQL Server service-packs). Must not involve much in the way of hardware procurement. The application is an ASP.NET web application. The web application's users have their own database instances. I've seen two main options: SQL Server failover clustering, and SQL Server mirroring. I understand that SQL Server Failover Clustering requires the purchasing of a shared disk array and doesn't offer any protection if the shared storage goes down (so the documentation recommends to set up a Mirroring between two clusters). Database Mirroring seems the cheaper option (as it only requires two database servers and a simple witness box) - but I've heard it doesn't work well when you have a large number of databases. The application I'm developing involves giving each client their own database for their application - there could be hundreds of databases. Setting up the mirroring is no problem thanks to the automation systems we have in place. My final point concerns how failover works with respect to client connections - SQL Server Failover Clustering uses MSCS which means that the cluster is invisible to clients - a connection attempt might fail during the failover, but a simple reconnect will have it working again. However mirroring, as far as I know, requires that the client be aware of the mirrored partners: if the client cannot connect to the primary server then it tries the secondary server. I'm wondering how this work with respect to Connection Pooling in ASP.NET applications - does the client connection failovering mean that there's a potential 2-second (assuming 2000ms TCP timeout policy) pause when the connection pool tries the primary server on every connection attempt? I read somewhere that Mirroring can be used on top of MSCS which means that the client does not need to be aware of mirroring (so there wouldn't be any potential delays during connection, and also that no changes would need to be made to the client, not even the connection string) - however I'm finding it hard to get documentation or white papers on this approach. But if true, then it means the best method is then Mirroring (for HA) with MSCS (for client ignorance and connection performance). ...but how does this scale to a server instance that might contain hundreds of mirrored databases?

    Read the article

  • Corosync :: Restarting some resources after Lan connectivity issue

    - by moebius_eye
    I am currently looking into corosync to build a two-node cluster. So, I've got it working fine, and it does what I want to do, which is: Lost connectivity between the two nodes gives the first node '10node' both Failover Wan IPs. (aka resources WanCluster100 and WanCluster101 ) '11node' does nothing. He "thinks" he still has his Failover Wan IP. (aka WanCluster101) But it doesn't do this: '11node' should restart the WanCluster101 resource when the connectivity with the other node is back. This is to prevent a condition where node10 simply dies (and thus does not get 11node's Failover Wan IP), resulting in a situation where none of the nodes have 10node's failover IP because 10node is down 11node has "given back" his failover Wan IP. Here's the current configuration I'm working on. node 10sch \ attributes standby="off" node 11sch \ attributes standby="off" primitive LanCluster100 ocf:heartbeat:IPaddr2 \ params ip="172.25.0.100" cidr_netmask="32" nic="eth3" \ op monitor interval="10s" \ meta is-managed="true" target-role="Started" primitive LanCluster101 ocf:heartbeat:IPaddr2 \ params ip="172.25.0.101" cidr_netmask="32" nic="eth3" \ op monitor interval="10s" \ meta is-managed="true" target-role="Started" primitive Ping100 ocf:pacemaker:ping \ params host_list="192.0.2.1" multiplier="500" dampen="15s" \ op monitor interval="5s" \ meta target-role="Started" primitive Ping101 ocf:pacemaker:ping \ params host_list="192.0.2.1" multiplier="500" dampen="15s" \ op monitor interval="5s" \ meta target-role="Started" primitive WanCluster100 ocf:heartbeat:IPaddr2 \ params ip="192.0.2.100" cidr_netmask="32" nic="eth2" \ op monitor interval="10s" \ meta target-role="Started" primitive WanCluster101 ocf:heartbeat:IPaddr2 \ params ip="192.0.2.101" cidr_netmask="32" nic="eth2" \ op monitor interval="10s" \ meta target-role="Started" primitive Website0 ocf:heartbeat:apache \ params configfile="/etc/apache2/apache2.conf" options="-DSSL" \ operations $id="Website-one" \ op start interval="0" timeout="40" \ op stop interval="0" timeout="60" \ op monitor interval="10" timeout="120" start-delay="0" statusurl="http://127.0.0.1/server-status/" \ meta target-role="Started" primitive Website1 ocf:heartbeat:apache \ params configfile="/etc/apache2/apache2.conf.1" options="-DSSL" \ operations $id="Website-two" \ op start interval="0" timeout="40" \ op stop interval="0" timeout="60" \ op monitor interval="10" timeout="120" start-delay="0" statusurl="http://127.0.0.1/server-status/" \ meta target-role="Started" group All100 WanCluster100 LanCluster100 group All101 WanCluster101 LanCluster101 location AlwaysPing100WithNode10 Ping100 \ rule $id="AlWaysPing100WithNode10-rule" inf: #uname eq 10sch location AlwaysPing101WithNode11 Ping101 \ rule $id="AlWaysPing101WithNode11-rule" inf: #uname eq 11sch location NeverLan100WithNode11 LanCluster100 \ rule $id="RAND1083308" -inf: #uname eq 11sch location NeverPing100WithNode11 Ping100 \ rule $id="NeverPing100WithNode11-rule" -inf: #uname eq 11sch location NeverPing101WithNode10 Ping101 \ rule $id="NeverPing101WithNode10-rule" -inf: #uname eq 10sch location Website0NeedsConnectivity Website0 \ rule $id="Website0NeedsConnectivity-rule" -inf: not_defined pingd or pingd lte 0 location Website1NeedsConnectivity Website1 \ rule $id="Website1NeedsConnectivity-rule" -inf: not_defined pingd or pingd lte 0 colocation Never -inf: LanCluster101 LanCluster100 colocation Never2 -inf: WanCluster100 LanCluster101 colocation NeverBothWebsitesTogether -inf: Website0 Website1 property $id="cib-bootstrap-options" \ dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ no-quorum-policy="ignore" \ stonith-enabled="false" \ last-lrm-refresh="1408954702" \ maintenance-mode="false" rsc_defaults $id="rsc-options" \ resource-stickiness="100" \ migration-threshold="3" I also have a less important question concerning this line: colocation NeverBothLans -inf: LanCluster101 LanCluster100 How do I tell it that this collocation only applies to '11node'.

    Read the article

  • Hyper-V R2 Live Migration

    Reliability is one of the great payoffs to virtualization, and failover clustering has got a whole lot better with Windows Server 2008 and Hyper-V. Now, you get failover without any downtime for the virtual machine. Jaap tells you how to implement it.

    Read the article

  • Implementing Cluster Continuous Replication, Part 3

    Cluster continuous replication (CCR) uses log shipping and failover to provide a more resilient email system with faster recovery. Once it is installed, a clustered server requires different management routines. These are done either with a GUI tool, The Failover Cluster Management Console, or the Exchange Management Shell. You can use Powershell as well for some tasks. Confused? Not for long, since Brien Posey is once more here to help.

    Read the article

  • Are We Losing a Standard (Edition) Data Recovery Technology?

    - by AllenMWhite
    One of the coolest technologies Microsoft released with SQL Server 2005 was Database Mirroring, which provided the ability to have a failover copy of a database on another SQL Server instance, and have the ability to automatically failover to that copy should a problem occur with the primary database. What was even cooler was that this new technology was available on Standard Edition! Mom and Pop shops could afford to implement a high availability solution without paying an extra tens of thousands...(read more)

    Read the article

  • mysql master-master setup as a way to simply master-slave promotion

    - by Chris Go
    I'm trying to see if the following plan is viable. Goal here is to be able to do HA (uptime) and not necessarily for load -- writes are fine on one MySQL 5.5 server (with innodb) but not really possible when the database is down. Currently, I have a master-slave replication setup which works fine except it doesn't have automatic promotion (obviously). what I am planning on doing is setup master-master replication to possibly do this "automatic promotion" using Amazon Route 53 DNS Failover (Health checks). What I am trying to avoid is to NOT have to do the auto-increment trick because the "business folks" got used to the auto-incrementing PK as consecutive numbers (yeah, I know this is bad but data is from 2004). So, setup the master-master replication WITHOUT the auto-increment collision prevention bit. The primary master is db1.domain.com and secondary master is db2.domain.com In Amazon Route 53, setup DNS Failover record for db.domain.com - primary failover is db1.domain.com - with a TCP healthcheck on IP address port 3306 - secondary failover is db2.domain.com - with a TCP healthcheck on IP address port 3306 Most of the time (99%), unless tcp://db1.domain.com:3306 is dead, db1.domain.com will be served up on DNS hits to db.domain.com. In fact, hopefully this is 100%. The possible downsides of this is the loss of a primary key (collision) and I think I am OK with losing one order. We are a low data volume B2B business and can just call our client up if this occurs (like an order disappearing). Does this sound like a good plan? Then I will also run another slave replication on db1.domain.com as "master" to a slave-db1.domain.com -- not sure why, maybe for heavy SELECTs?

    Read the article

  • 2 Server FC SAN Configuration

    - by BSte
    I have 2 identical servers: -48GB Ram -8GigE NIC's -2FC NIC's -2x72GB RAID1 Hard Drives -Server 2008R2 Host I also Have a Fibre Channel SAN: -16x146GB RAID10 Hard Drives -2xDual-port FC Controllers (Controller A and B both have ports 1 and 2) -Server 1 has Fiber to Ports A1 and B1 -Server 2 has Fiber to Ports A2 and B2 -I kept the default config with 1 Virtual Disk and 1 Volume -The default mappings show ports A1,A2,B1,B2 on LUN 0 with read-write My goal is: -2xVM's with IIS and Guest Level Failover -2xVM's with SQL 2008 Enterprise using a Single DB and Guest Level Failover -1xVM that is an application server, preferable with Host Failover. From what I read, this will also need AD for clustering to work. -I need at least 1 VM always running for IIS and the SQLDB. This includes hardware failover and application (ie: reboot a VM for Critical updates) I was told I could install the VM's and run them from the SAN, and this is what I've tried: Installed MPIO and HyperV on Server1 and Server 2 Added the SAN as Disk E: on both servers, made it GPT and formatted NTFS Configured HyperV on both server to store use E:\VD and E:\VHD On server1, I was able to install 3 VM's on the SAN and all worked well. On server2, I would start installing the other 2 VM's, but always at some point the VM's would get a corrupt .VHD message (either server). Everything I found about the message typically related to antivirus, so I removed all antivirus on both Host servers (now only running 2008R2). I reformatted drive E: (SAN), recreated the VHD and VD directories, installed 3 VM's on Server 1, and then had the same issue when installing VM's on Server2. Obviously something is wrong, but I'm not certain what exactly. My questions: 1) Are my goals possible with this hardware setup? -I've read 2008R2 supports FC SAN's, but a lot of articles seem to only give examples with iSCSCI setups 2) What would be the suggested route on setting up the SAN (disks,volumes,LUN's)? I've worked with HyperV on a single machine before and never had issues. Actual experience working on SAN's and clustering is new to me. Any suggestions or recommendations to get me in the right direction would be much appreciated.

    Read the article

  • What's the difference between keepalived and corosync, others?

    - by Matt
    I'm building a failover firewall for a server cluster and started looking at the various options. I'm more familiar with carp on freebsd, but need to use linux for this project. Searching google has produced several different projects, but no clear information about features they provide . CARP gave virtual interfaces that failover, I am not really clear on whether that's what corosync does, or is that what pacemaker does? On the other hand I did get manage to get keepalived working. However, I noted that corosync provides native support for infiniband. This would be useful for me. Perhaps someone could shed some light on the differences between: corosync keepalive pacemaker heartbeat Which product would be the best fit for router failover?

    Read the article

  • SQLAuthority News – 2 Whitepapers Announced – AlwaysOn Architecture Guide: Building a High Availability and Disaster Recovery Solution

    - by pinaldave
    Understanding AlwaysOn Architecture is extremely important when building a solution with failover clusters and availability groups. Microsoft has just released two very important white papers related to this subject. Both the white papers are written by top experts in industry and have been reviewed by excellent panel of experts. Every time I talk with various organizations who are adopting the SQL Server 2012 they are always excited with the concept of the new feature AlwaysOn. One of the requests I often here is the related to detailed documentations which can help enterprises to build a robust high availability and disaster recovery solution. I believe following two white paper now satisfies the request. AlwaysOn Architecture Guide: Building a High Availability and Disaster Recovery Solution by Using AlwaysOn Availability Groups SQL Server 2012 AlwaysOn Availability Groups provides a unified high availability and disaster recovery (HADR) solution. This paper details the key topology requirements of this specific design pattern on important concepts like quorum configuration considerations, steps required to build the environment, and a workflow that shows how to handle a disaster recovery. AlwaysOn Architecture Guide: Building a High Availability and Disaster Recovery Solution by Using Failover Cluster Instances and Availability Groups SQL Server 2012 AlwaysOn Failover Cluster Instances (FCI) and AlwaysOn Availability Groups provide a comprehensive high availability and disaster recovery solution. This paper details the key topology requirements of this specific design pattern on important concepts like asymmetric storage considerations, quorum model selection, quorum votes, steps required to build the environment, and a workflow. If you are not going to implement AlwaysOn feature, this two Whitepapers are still a great reference material to review as it will give you complete idea regarding what it takes to implement AlwaysOn architecture and what kind of efforts needed. One should at least bookmark above two white papers for future reference. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Documentation, SQL Download, SQL Query, SQL Server, SQL Tips and Tricks, SQL White Papers, T SQL, Technology Tagged: AlwaysOn

    Read the article

  • How-To: Run CMSDK against a RAC cluster

    - by frank.closheim
    Using CMSDK in a production environment often requires a robust, reliable and failover enabled repository. When using Oracle Real Application Cluster (RAC) with your CMSDK repository you need to have a specific configuration in place to support such a setup. This post will explain the configuration steps required when running CMSDK 9.0.4.6 with Oracle WebLogic Server (WLS).In the previous CMSDK 9.0.4.2 version a RAC enabled connect string looked like this: (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = rac1)(PORT = 1521))(ADDRESS = (PROTOCOL = TCP)(HOST = rac2)(PORT = 1521))(LOAD_BALANCE = NO)(FAILOVER = ON)(CONNECT_DATA =(SERVICE_NAME = rac)(failover_mode = (type=select)(method=basic)))CMSDK 9.0.4.6 makes use of data sources to connect to the underlying database. These data sources are configured inside your Application Server, such as Oracle WebLogic Server.In Oracle WebLogic Server 10.3.4, a single data source implementation has been introduced to support an RAC cluster. It responds to Fast Application Notification (FAN) events to provide Fast Connection Failover (FCF), Runtime Connection Load-Balancing (RCLB), and RAC instance graceful shutdown. XA affinity is supported at the global transaction Id level. The new feature is called WebLogic Active GridLink for RAC; which is implemented as the GridLink data source within WebLogic Server.This GridLink data source also works with Oracle Single Client Access Name (SCAN). SCAN is a feature used in RAC environments that provides a single name for clients to access any Oracle Database running in a cluster. You can think of SCAN as a cluster alias for databases in the cluster. The benefit is that the client’s connect information does not need to change if you add or remove nodes or databases in the cluster.The CMSDK 9.0.4.6 documentation describes how to create a regular JDBC data source named jdbc/OracleDS. Please refer to the following document which describes in detail how to create a GridLink data source in WLS.

    Read the article

  • Looking for advice on Hyper-v storage replication

    - by Notre1
    I am designing a 2-host Hyper-V R2 cluster with 6-10 guests stored on a SMB iSCSI SAN device (probably Promise VessRAID). I will be getting at least two of the SAN devices and need to eliminate the storage a single point of failure. Ideally, that would involve real-time failover for the storage, like the Windows failover clustering does for the hosts. This design will be used at around six of our sites, and I would like to allow for us to eventually setup a cluster at colocation site and replicate each site's VMs there for DR. (Ideally a live multi-site cluster, but a manual import of the VMs would be fine for this sort of DR.) The tools that come with enterprise SANs, like EMC and NetApp, seem to be the most commonly used items for a Hyper-V cluster, but I can't afford their prices with my budget. Outside of them, the two tools that seem to be most common for Hyper-V storage replication are SteelEye (now SIOS) DataKeeper Cluster Edition and Double-Take Availability. Originally, I was planning on using Clustered Shared Volume(s) (CSV), but it seems like replication support for these is either not available or brand new in both these products. It looks like CSVs are supported in Double-Take 5.22, see this discussion, but I don't think I want to run something that new in production. Right now, it seems like the best option for me is not to implement CSVs, implement some sort of storage replication, and upgrade to CSVs at a later date once replicating them is more mature. I would love to have live migration, and CSVs are not required for live migration if you are using one LUN per VM, so I guess this is what I'll do. I would prefer to stick to the using the Microsoft Windows Server and Hyper-V tools and features as much as possible. From that standpoint, SteelEye looks more appealing than Double-Take because they make the DataKeeper volume(s) available to the Failover Clustering Manager and then failover clustering is all configured and managed through the native Microsoft tools. Double-Take says that "clustered Hyper-V hosts are not supported," and Double-Take Availability itself seems to be what is used for the actual clustering and failover. Does anyone know if any of these replication tools work with more than two hosts in the cluster? All the information I can find on the web only uses two hosts in their examples. Are there any better tools than SteelEye and Double-Take for doing what I am trying to do, which is eliminate the storage as as single point of failure? Neverfail, AppAssure, and DataCore all seem to offer similar functionality, but they don't seems to be as popular as SteelEye and Double-Take. I have seen a number of people suggest using Starwind iSCSI SAN software for the shared storage, which includes replication (and CSV replication at that). There are a couple of reasons I have not seriously considered this route: 1) The company I work for is exclusively a Dell shop and Dell does not have any servers with that I can pack with more than six 3.5" SATA drives. 2) In the future, it could be advantegous for us to not be locked into a particular brand or type of storage and third-party replication softwares all allow replication to heterogeneous storage devices. I am pretty new to iSCSI and clustering, so please let me know if it looks like I am planning something that goes against best practices or overlooking/missing something.

    Read the article

  • Looking for advice on Hyper-v storage replication

    - by Notre1
    I am designing a 2-host Hyper-V R2 cluster with 6-10 guests stored on a SMB iSCSI SAN device (probably Promise VessRAID). I will be getting at least two of the SAN devices and need to eliminate the storage a single point of failure. Ideally, that would involve real-time failover for the storage, like the Windows failover clustering does for the hosts. This design will be used at around six of our sites, and I would like to allow for us to eventually setup a cluster at colocation site and replicate each site's VMs there for DR. (Ideally a live multi-site cluster, but a manual import of the VMs would be fine for this sort of DR.) The tools that come with enterprise SANs, like EMC and NetApp, seem to be the most commonly used items for a Hyper-V cluster, but I can't afford their prices with my budget. Outside of them, the two tools that seem to be most common for Hyper-V storage replication are SteelEye (now SIOS) DataKeeper Cluster Edition and Double-Take Availability. Originally, I was planning on using Clustered Shared Volume(s) (CSV), but it seems like replication support for these is either not available or brand new in both these products. It looks like CSVs are supported in Double-Take 5.22, see this discussion, but I don't think I want to run something that new in production. Right now, it seems like the best option for me is not to implement CSVs, implement some sort of storage replication, and upgrade to CSVs at a later date once replicating them is more mature. I would love to have live migration, and CSVs are not required for live migration if you are using one LUN per VM, so I guess this is what I'll do. I would prefer to stick to the using the Microsoft Windows Server and Hyper-V tools and features as much as possible. From that standpoint, SteelEye looks more appealing than Double-Take because they make the DataKeeper volume(s) available to the Failover Clustering Manager and then failover clustering is all configured and managed through the native Microsoft tools. Double-Take says that "clustered Hyper-V hosts are not supported," and Double-Take Availability itself seems to be what is used for the actual clustering and failover. Does anyone know if any of these replication tools work with more than two hosts in the cluster? All the information I can find on the web only uses two hosts in their examples. Are there any better tools than SteelEye and Double-Take for doing what I am trying to do, which is eliminate the storage as as single point of failure? Neverfail, AppAssure, and DataCore all seem to offer similar functionality, but they don't seems to be as popular as SteelEye and Double-Take. I have seen a number of people suggest using Starwind iSCSI SAN software for the shared storage, which includes replication (and CSV replication at that). There are a couple of reasons I have not seriously considered this route: 1) The company I work for is exclusively a Dell shop and Dell does not have any servers with that I can pack with more than six 3.5" SATA drives. 2) In the future, it could be advantegous for us to not be locked into a particular brand or type of storage and third-party replication softwares all allow replication to heterogeneous storage devices. I am pretty new to iSCSI and clustering, so please let me know if it looks like I am planning something that goes against best practices or overlooking/missing something.

    Read the article

  • To SYNC or not to SYNC – Part 3

    - by AshishRay
    I can't believe it has been almost a year since my last blog post. I know, that's an absolute no-no in the blogosphere. And I know that "I have been busy" is not a good excuse. So - without trying to come up with an excuse - let me state this - my apologies for taking such a long time to write the next Part. Without further ado, here goes. This is Part 3 of a multi-part blog article where we are discussing various aspects of setting up Data Guard synchronous redo transport (SYNC). In Part 1 of this article, I debunked the myth that Data Guard SYNC is similar to a two-phase commit operation. In Part 2, I discussed the various ways that network latency may or may not impact a Data Guard SYNC configuration. In this article, I will talk in details regarding why Data Guard SYNC is a good thing. I will also talk about distance implications for setting up such a configuration. So, Why Good? Why is Data Guard SYNC a good thing? Because, at the end of the day, this gives you the assurance of zero data loss - it doesn’t matter what outage may befall your primary system. Befall! Boy, that sounds theatrical. But seriously - think about this - it minimizes your data risks. That’s a big deal. Whether you have an outage due to bad disks, faulty hardware components, hardware / software bugs, physical data corruptions, power failures, lightning that takes out significant part of your data center, fire that melts your assets, water leakage from the cooling system, human errors such as accidental deletion of online redo log files - it doesn’t matter - you can have that “Om - peace” look on your face and then you can failover to the standby system, without losing a single bit of data in your Oracle database. You will be a hero, as shown in this not so imaginary conversation: IT Manager: Well, what’s the status? You: John is doing the trace analysis on the storage array. IT Manager: So? How long is that gonna take? You: Well, he is stuck, waiting for a response from <insert your not-so-favorite storage vendor here>. IT Manager: So, no root cause yet? You: I told you, he is stuck. We have escalated with their Support, but you know how long these things take. IT Manager: Darn it - the site is down! You: Not really … IT Manager: What do you mean? You: John is stuck, but Sreeni has already done a failover to the Data Guard standby. IT Manager: Whoa, whoa - wait! Failover means we lost some data, why did you do this without letting the Business group know? You: We didn’t lose any data. Remember, we had set up Data Guard with SYNC? So now, any problems on the production – we just failover. No data loss, and we are up and running in minutes. The Business guys don’t need to know. IT Manager: Wow! Are we great or what!! You: I guess … Ok, so you get it - SYNC is good. But as my dear friend Larry Carpenter says, “TANSTAAFL”, or "There ain't no such thing as a free lunch". Yes, of course - investing in Data Guard SYNC means that you have to invest in a low-latency network, you have to monitor your applications and database especially in peak load conditions, and you cannot under-provision your standby systems. But all these are good and necessary things, if you are supporting mission-critical apps that are supposed to be running 24x7. The peace of mind that this investment will give you is priceless, especially if you are serious about HA. How Far Can We Go? Someone may say at this point - well, I can’t use Data Guard SYNC over my coast-to-coast deployment. Most likely - true. So how far can you go? Well, we have customers who have deployed Data Guard SYNC over 300+ miles! Does this mean that you can also deploy over similar distances? Duh - no! I am going to say something here that most IT managers don’t like to hear - “It depends!” It depends on your application design, application response time / throughput requirements, network topology, etc. However, because of the optimal way we do SYNC, customers have been able to stretch Data Guard SYNC deployments over longer distances compared to traditional, storage-centric ways of doing this. The MAA Database 10.2 best practices paper Data Guard Redo Transport & Network Configuration, and Oracle Database 11.2 High Availability Best Practices Manual talk about some of these SYNC-related metrics. For example, a test deployment of Data Guard SYNC over 330 miles with 10ms latency showed an impact less than 5% for a busy OLTP application. Even if you can’t deploy Data Guard SYNC over your WAN distance, or if you already have an ASYNC standby located 1000-s of miles away, here’s another nifty way to boost your HA. Have a local standby, configured SYNC. How local is “local”? Again - it depends. One customer runs a local SYNC standby across the campus. Another customer runs it across 15 miles in another data center. Both of these customers are running Data Guard SYNC as their HA standard. If a localized outage affects their primary system, no problem! They have all the data available on the standby, to which they can failover. Very fast. In seconds. Wait - did I say “seconds”? Yes, Virginia, there is a Santa Claus. But you have to wait till the next blog article to find out more. I assure you tho’ that this time you won’t have to wait for another year for this.

    Read the article

  • Windows Cluster issue

    - by George2
    Hello everyone, For Windows Server 2008 failover cluster, I find it is dependent on Windows domain (in other words, Windows domain is pre-requisite of installing Windows Server 2008 fail over cluster). I am not sure why it is dependent on Windows domain? What features of Windows domain will Windows Server 2008 failover cluster utilize? thanks in advance, George

    Read the article

  • Can't get DHCPd to assign IPs to unknown clients

    - by Jakobud
    I'm using Webmin to admin our DHCPd server. But I'm having a hard time getting it to assign IP addresses to unknown clients. The only way I can get it to assign an IP is to make sure a host is added to DHCPd as a host so that it gets a static-lease IP assigned to it. I thought "Allow Unknown Clients" was the key, but it still isn't assigning IPs to unknown clients. I have a pool setup so that the unknown clients should get an IP between 10.20.0.200 - 10.20.0.249. Here is the config file. What am I missing here? allow unknown-clients; # Primary DHCP server config authoritative; ddns-update-style none; failover peer "dhcp-failover" { primary; address 10.20.0.30; port 647; peer address 10.20.0.25; peer port 647; max-response-delay 60; max-unacked-updates 10; load balance max seconds 3; mclt 3600; split 128; } subnet 10.20.0.0 netmask 255.255.255.0 { allow unknown-clients; option subnet-mask 255.255.255.0; option broadcast-address 10.20.0.255; option routers 10.20.0.100; option domain-name "ourdomain.com"; option domain-name-servers 192.168.10.20; default-lease-time 86400; max-lease-time 86400; option ntp-servers 192.168.10.20; option time-offset -25200; pool { allow unknown-clients; failover peer "dhcp-failover"; max-lease-time 86400; range 10.20.0.200 10.20.0.249; deny dynamic bootp clients; } host Server-myserver { option host-name "whatever.ourdomain.com"; hardware ethernet 00:89:D4:35:4F:13; fixed-address 10.20.0.23; } }

    Read the article

  • ASA and cisco vs NSA sonic firewall

    - by Lbaker101
    Currently I’m trying to structure our network to fully support and be redundant with BGP/Multi homing. Our current company size is 40 employees but the major part of that is our Development department. We are a software company and continued connection to the internet is a requirement as 90% of work stops when the net goes down. The only thing hosted on site (that needs to remain up) is our exchange server. Right now i'm faced with 2 different directions and was wondering if I could get your opinions on this. We will have 2 ISPs that are both 20meg up/down and dedicated fiber (so 40megs combined). This is handed off as an Ethernet cable into our server room. ISP#1 first digital ISP#2 CenturyLink we currently have 2x ASA5505s but the 2nd one is not in use. It was there to be a failover and it just needs the security+ license to be matched with the primary device. But this depends on the network structure. I have been looking into the hardware that would be required to be fully redundant and I found that we will either of the following. 2x Cisco 2921+ series routers with failover licenses. They will go in front of the ASAs and either connects in a failover state or 1 ISP into each of the 2921 series routers and then 1 line into each of the ASAs (thus all 4 hardware components will be used actively). So 2x Cisco 2921+ series routers 2x Cisco ASA5505 firewalls The other route 2x SonicWalls NSA2400MX series. 1 primary and the secondary will be in a failover state. This will remove the ASAs from the network and be about 2k cheaper than the cisco route. This also brings down the points of failure because it’s just the 2x sonicwalls It will also allow us to scale all the way up to 200-400 users (depending on their configuration). This also makes so the Sonic walls. So the real question is with the added functionality ect of the sonicwall is there a point in paying so much more to stay the cisco route? Thanks!

    Read the article

  • looking for a model number recommendation for a network setup of 49 switches [closed]

    - by Bahrain Admin
    im looking to setup a site with 49 edge switches connected by fiber to a central switch. 3 VLANs will be setup to handle data, telephony, and streaming media. each edge switch should have provision for 2 SFP modules for failover, and the core switch needs to have the provision to handle this failover. i'm getting lost on the Cisco site with their specs and recommendations. if anyone could suggest a suitable model number for the core switch and the edge switch, it would be really appreciated.

    Read the article

  • Approaches for memcached sessions

    - by Industrial
    Hi everybody, I was thinking about using memcached to store sessions instead of mySQL, which seemed like a good idea, at first. When it comes to the failover part of utilizing memcached servers, It's a bit worrying that my sessions will stop working if the memcached would go offline. It will certainly affect my users. There's a few techniques that we already utilize to reduce failover, including having a pool of servers available to compensate in the event of downtime, utilizing sharding/consistent hashing across the server pool and so on. We would also do some sort of graceful degradation that tells the users that something have gone wrong and they are welcome to login again, in the event of them being kicked out due to memcached server failover. So how does people generally deal with these issues when storing sessions on memcached servers?

    Read the article

  • Slow NFS and GFS2 performance

    - by Tiago
    Recently I've designed and configured a 4 node cluster for a webapp that does lots of file handling. The cluster have been broken down into 2 main roles, webserver and storage. Each role is replicated to a second server using drbd in active/passive mode. The webserver does a NFS mount of the data directory of the storage server and the latter also has a webserver running to serve files to browser clients. In the storage servers I've created a GFS2 FS to hold the data which is wired to drbd. I've chose GFS2 mainly because the announced performance and also because the volume size which has to be pretty high. Since we entered production I've been facing two problems that I think are deeply connected. First of all, the NFS mount on the webservers keeps hanging for a minute or so and then resumes normal operations. By analyzing the logs I've found out that NFS stops answering for a while and outputs the following log lines: Oct 15 18:15:42 <server hostname> kernel: nfs: server active.storage.vlan not responding, still trying Oct 15 18:15:44 <server hostname> kernel: nfs: server active.storage.vlan not responding, still trying Oct 15 18:15:46 <server hostname> kernel: nfs: server active.storage.vlan not responding, still trying Oct 15 18:15:47 <server hostname> kernel: nfs: server active.storage.vlan not responding, still trying Oct 15 18:15:47 <server hostname> kernel: nfs: server active.storage.vlan not responding, still trying Oct 15 18:15:47 <server hostname> kernel: nfs: server active.storage.vlan not responding, still trying Oct 15 18:15:48 <server hostname> kernel: nfs: server active.storage.vlan not responding, still trying Oct 15 18:15:48 <server hostname> kernel: nfs: server active.storage.vlan not responding, still trying Oct 15 18:15:51 <server hostname> kernel: nfs: server active.storage.vlan not responding, still trying Oct 15 18:15:52 <server hostname> kernel: nfs: server active.storage.vlan not responding, still trying Oct 15 18:15:52 <server hostname> kernel: nfs: server active.storage.vlan not responding, still trying Oct 15 18:15:55 <server hostname> kernel: nfs: server active.storage.vlan not responding, still trying Oct 15 18:15:55 <server hostname> kernel: nfs: server active.storage.vlan not responding, still trying Oct 15 18:15:58 <server hostname> kernel: nfs: server active.storage.vlan OK Oct 15 18:15:59 <server hostname> kernel: nfs: server active.storage.vlan OK Oct 15 18:15:59 <server hostname> kernel: nfs: server active.storage.vlan OK Oct 15 18:15:59 <server hostname> kernel: nfs: server active.storage.vlan OK Oct 15 18:15:59 <server hostname> kernel: nfs: server active.storage.vlan OK Oct 15 18:15:59 <server hostname> kernel: nfs: server active.storage.vlan OK Oct 15 18:15:59 <server hostname> kernel: nfs: server active.storage.vlan OK Oct 15 18:15:59 <server hostname> kernel: nfs: server active.storage.vlan OK Oct 15 18:15:59 <server hostname> kernel: nfs: server active.storage.vlan OK Oct 15 18:15:59 <server hostname> kernel: nfs: server active.storage.vlan OK Oct 15 18:15:59 <server hostname> kernel: nfs: server active.storage.vlan OK Oct 15 18:15:59 <server hostname> kernel: nfs: server active.storage.vlan OK Oct 15 18:15:59 <server hostname> kernel: nfs: server active.storage.vlan OK In this case, the hang lasted for 16 seconds but sometimes it takes 1 or 2 minutes to resume normal operations. My first guess was this was happening due to heavy load of the NFS mount and that by increasing RPCNFSDCOUNT to a higher value, this would become stable. I've increased it several times and apparently, after a while, the logs started appearing less times. The value is now on 32. After further investigating the issue, I've came across a different hang, despite the NFS messages still appear in the logs. Sometimes, the GFS2 FS simply hangs which causes both the NFS and the storage webserver to serve files. Both stay hang for a while and then they resume normal operations. This hangs leaves no trace on client side (also leaves no NFS ... not responding messages) and, on the storage side, the log system appears to be empty, even though the rsyslogd is running. The nodes connect themselves through a 10Gbps non-dedicated connection but I don't think this is an issue because the GFS2 hang is confirmed but connecting directly to the active storage server. I've been trying to solve this for a while now and I've tried different NFS configuration options, before I've found out the GFS2 FS is also hanging. The NFS mount is exported as such: /srv/data/ <ip_address>(rw,async,no_root_squash,no_all_squash,fsid=25) And the NFS client mounts with: mount -o "async,hard,intr,wsize=8192,rsize=8192" active.storage.vlan:/srv/data /srv/data After some tests, these were the configurations that yielded more performance to the cluster. I am desperate to find a solution for this as the cluster is already in production mode and I need to fix this so that this hangs won't happen in the future and I don't really know for sure what and how I should be benchmarking. What I can tell is that this is happening due to heavy loads as I have tested the cluster earlier and this problems weren't happening at all. Please tell me if you need me to provide configuration details of the cluster, and which do you want me to post. As last resort I can migrate the files to a different FS but I need some solid pointers on whether this will solve this problems as the volume size is extremely large at this point. The servers are being hosted by a third-party enterprise and I don't have physical access to them. Best regards. EDIT 1: The servers are physical servers and their specs are: Webservers: Intel Bi Xeon E5606 2x4 2.13GHz 24GB DDR3 Intel SSD 320 2 x 120GB Raid 1 Storage: Intel i5 3550 3.3GHz 16GB DDR3 12 x 2TB SATA Initially there was a VRack setup between the servers but we've upgraded one of the storage servers to have more RAM and it wasn't inside the VRack. They connect through a shared 10Gbps connection between them. Please note that it is the same connection that is used for public access. They use a single IP (using IP Failover) to connect between them and to allow for a graceful failover. NFS is therefore over a public connection and not under any private network (it was before the upgrade, were the problem still existed). The firewall was configured and tested thoroughly but I disabled it for a while to see if the problem still occurred, and it did. From my knowledge the hosting provider isn't blocking or limiting the connection between either the servers and the public domain (at least under a given bandwidth consumption threshold that hasn't been reached yet). Hope this helps figuring out the problem. EDIT 2: Relevant software versions: CentOS 2.6.32-279.9.1.el6.x86_64 nfs-utils-1.2.3-26.el6.x86_64 nfs-utils-lib-1.1.5-4.el6.x86_64 gfs2-utils-3.0.12.1-32.el6_3.1.x86_64 kmod-drbd84-8.4.2-1.el6_3.elrepo.x86_64 drbd84-utils-8.4.2-1.el6.elrepo.x86_64 DRBD configuration on storage servers: #/etc/drbd.d/storage.res resource storage { protocol C; on <server1 fqdn> { device /dev/drbd0; disk /dev/vg_storage/LV_replicated; address <server1 ip>:7788; meta-disk internal; } on <server2 fqdn> { device /dev/drbd0; disk /dev/vg_storage/LV_replicated; address <server2 ip>:7788; meta-disk internal; } } NFS Configuration in storage servers: #/etc/sysconfig/nfs RPCNFSDCOUNT=32 STATD_PORT=10002 STATD_OUTGOING_PORT=10003 MOUNTD_PORT=10004 RQUOTAD_PORT=10005 LOCKD_UDPPORT=30001 LOCKD_TCPPORT=30001 (can there be any conflict in using the same port for both LOCKD_UDPPORT and LOCKD_TCPPORT?) GFS2 configuration: # gfs2_tool gettune <mountpoint> incore_log_blocks = 1024 log_flush_secs = 60 quota_warn_period = 10 quota_quantum = 60 max_readahead = 262144 complain_secs = 10 statfs_slow = 0 quota_simul_sync = 64 statfs_quantum = 30 quota_scale = 1.0000 (1, 1) new_files_jdata = 0 Storage network environment: eth0 Link encap:Ethernet HWaddr <mac address> inet addr:<ip address> Bcast:<bcast address> Mask:<ip mask> inet6 addr: <ip address> Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:957025127 errors:0 dropped:0 overruns:0 frame:0 TX packets:1473338731 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:2630984979622 (2.3 TiB) TX bytes:1648430431523 (1.4 TiB) eth0:0 Link encap:Ethernet HWaddr <mac address> inet addr:<ip failover address> Bcast:<bcast address> Mask:<ip mask> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 The IP addresses are statically assigned with the given network configurations: DEVICE="eth0" BOOTPROTO="static" HWADDR=<mac address> ONBOOT="yes" TYPE="Ethernet" IPADDR=<ip address> NETMASK=<net mask> and DEVICE="eth0:0" BOOTPROTO="static" HWADDR=<mac address> IPADDR=<ip failover> NETMASK=<net mask> ONBOOT="yes" BROADCAST=<bcast address> Hosts file to allow for a graceful NFS failover in conjunction with NFS option fsid=25 set on both storage servers: #/etc/hosts <storage ip failover address> active.storage.vlan <webserver ip failover address> active.service.vlan As you can see, packet errors are down to 0. I've also ran ping for a long time without any packet loss. MTU size is the normal 1500. As there is no VLan by now, this is the MTU used to communicate between servers. The webservers' network environment is similar. One thing I forgot to mention is that the storage servers handle ~200GB of new files each day through the NFS connection, which is a key point for me to think this is some kind of heavy load problem with either NFS or GFS2. If you need further configuration details please tell me. EDIT 3: Earlier today we had a major filesystem crash on the storage server. I couldn't get the details of the crash right away because the server stop responding. After the reboot, I noticed the filesystem was extremely slow, and I was not being able to serve a single file through either NFS or httpd, perhaps due to cache warming or so. Nevertheless, I've been monitoring the server closely and the following error came up in dmesg. The source of the problem is clearly GFS, which is waiting for a lock and ends up starving after a while. INFO: task nfsd:3029 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. nfsd D 0000000000000000 0 3029 2 0x00000080 ffff8803814f79e0 0000000000000046 0000000000000000 ffffffff8109213f ffff880434c5e148 ffff880624508d88 ffff8803814f7960 ffffffffa037253f ffff8803815c1098 ffff8803814f7fd8 000000000000fb88 ffff8803815c1098 Call Trace: [<ffffffff8109213f>] ? wake_up_bit+0x2f/0x40 [<ffffffffa037253f>] ? gfs2_holder_wake+0x1f/0x30 [gfs2] [<ffffffff814ff42e>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff814ff2cb>] mutex_lock+0x2b/0x50 [<ffffffffa0379f21>] gfs2_log_reserve+0x51/0x190 [gfs2] [<ffffffffa0390da2>] gfs2_trans_begin+0x112/0x1d0 [gfs2] [<ffffffffa0369b05>] ? gfs2_dir_check+0x35/0xe0 [gfs2] [<ffffffffa0377943>] gfs2_createi+0x1a3/0xaa0 [gfs2] [<ffffffff8121aab1>] ? avc_has_perm+0x71/0x90 [<ffffffffa0383d1e>] gfs2_create+0x7e/0x1a0 [gfs2] [<ffffffffa037783f>] ? gfs2_createi+0x9f/0xaa0 [gfs2] [<ffffffff81188cf4>] vfs_create+0xb4/0xe0 [<ffffffffa04217d6>] nfsd_create_v3+0x366/0x4c0 [nfsd] [<ffffffffa0429703>] nfsd3_proc_create+0x123/0x1b0 [nfsd] [<ffffffffa041a43e>] nfsd_dispatch+0xfe/0x240 [nfsd] [<ffffffffa025a5d4>] svc_process_common+0x344/0x640 [sunrpc] [<ffffffff810602a0>] ? default_wake_function+0x0/0x20 [<ffffffffa025ac10>] svc_process+0x110/0x160 [sunrpc] [<ffffffffa041ab62>] nfsd+0xc2/0x160 [nfsd] [<ffffffffa041aaa0>] ? nfsd+0x0/0x160 [nfsd] [<ffffffff81091de6>] kthread+0x96/0xa0 [<ffffffff8100c14a>] child_rip+0xa/0x20 [<ffffffff81091d50>] ? kthread+0x0/0xa0 [<ffffffff8100c140>] ? child_rip+0x0/0x20

    Read the article

  • Virutal Machine loses network connectivity on Hyper V Cluster

    - by Chris W
    We're running a number of VMs on a 6 node failover cluster of blades using Hyper V. We have an intermittent issue (every few days at different times - not a fixed frequency) of VMs losing network connectivity. Console access to the VM suggests all is fine and the underlying blade has normal connectivity. To resolve the problem we either have to re-start the VM or, more usually, we do a live migration to another blade which fires up connectivity and we then migrate it back to the original blade. I've had 3 instances of this happen with a specific VM running on a particular blade however it has happened once with a different VM running on a different blade. All VMs and blades have the same basic setup and are running Windows 2008 R2. Any ideas where I should be looking to diagnose the possible causes of this problem as the event logs provide no help? Edit: I've checked that each blade is running the latest NIC drivers and all seem to be fine. Something that is confusing me - a failover or restart of the VM resolves the issue. Whilst I need to work out the underlying issue that is causing the NICs to hang I'm also concerned that the VM didn't failover to another node which would have solved the outage for me. Is there a way to configure the cluster so that it can tell that the VM guest has lost connectivity and fail it over? As things stand the cluster is assuming that the VM is running happily as I presume Hyper V says everything is great even though there is a problem.

    Read the article

  • HA for Resque & Redis

    - by Chris Go
    Trying to avoid SPOFs for Resque and Redis. Ultimately the client is going to be PHP via (https://github.com/chrisboulton/php-resque). After going through and finding some workable HA for nginx+php-fpm and MySQL (mysql master-master setup as a way to simply master-slave promotion), next up is Resque+Redis. Standard install of Resque uses localhost Redis (at DigitalOcean). I am heavily depending on Amazon Route 53 DNS failover to try to solve this. resque1.domain.com points to localhost redis (redis1.domain.com) = same server resque2.domain.com points to localhost redis (redis2.domain.com) = same server Do resque.domain.com with FAILOVER resque1 as primary and resque2 as secondary. What this means is that most of the time (99%), resque1 should be getting hit with resque2 as just a hot backup. This lets me just have to get 2 servers and makes sure that any hits to resque.domain.com goes somewhere The other way to do this is to break out resque and redis into 4 servers and do it as follows resque1.domain.com - redis.domain.com resque2.domain.com - redis.domain.com redis1.domain.com redis2.domain.com Then setup DNS Failover resque.domain.com - primary: resque1 and secondary: resque2 redis.domain.com - primary: redis1 and secondary: redis2 I'd like to get away for 2 servers if I can but is this 2nd setup much better or negligible? Thanks, Chris

    Read the article

  • Network Load Balancing and AnyCast Routing

    - by user126917
    Hi All can anyone advise on problems with the following? I am planning on installing the following setup on my estate: I have 2 sites that both have a large amount of users. Goals are to keep things simple for the users and to have automatic failover above the database level. Our Database will exist at the primary site and be async mirrored to the secondary site with manual failover procedures.The database generate sequential ID's so distributing it is not an option. I plan to site IIS boxes at both sites with all of the business logic on them and heavy operations. The connections to SQL will be lightweight and DB reads will be cached on IIS. On this layer I plan to use Windows network load balancing and have the same IP or IPs across all IIS boxes at both sites. This way there will be automatic failover and no single point of failure. Also users can have one web address regardless of which site they are in automatically be network load balanced to their local IIS. This is great but obviously our two sites are on different subnets and as this will be one IP address with most of our traffic we can't go broadcasting everything across the link between the sites. To solve this problem we plan to use AnyCast routing over our network layer to route the traffic to the most local box that is listening which will be defined by the network load balancing. Has anyone used this setup before? Can anyone think of any issues with this? Also some specifics I can't find anywhere at the moment. If my Windows box is assigned an IP and listening on that IP but network load balancing is not accepting specific traffic then will AnyCast route away from that? Also can I AnyCast on a socket level?

    Read the article

  • How do I keep a bridge enabled on a bonded interface?

    - by jlawer
    I'm working on setting up a pair of CentOS 6.3 servers that will run a couple of KVM vms and have come across a problem setting up a bridge on a bond. I am using Mode 4 (802.3ad) bonding on a pair of stacked Dell Powerconnect 5524 switches connecting to R320 servers. There are 2 links (1 to each switch) that form a Link Aggregation Group (802.3ad / LACP bonding). On top of the bond I have VLAN Tagging. I've verified this is a problem on multiple other bonding modes so it isn't just a mode 4 issue. I am testing what happens when 1 link is dropped (ie switch dies, cable breaks, etc). If I don't have a bridge (for KVM), everything works fine, failover happens as expected. If I have the bridge enabled, it works fine until failover (unplugging a cable). When failover happens /var/log/messages shows the slave link going down, followed within a second by: kernel: br1: port 1(bond0.8) entering disabled state The thing is /proc/net/bonding/bond0 shows the link is up as expected (simply with only 1 slave instead of 2). If I plug the cable back in it recovers and brings the bridge back to an enabled state. I actually have tested this while a ping is occuring and if the timing is right a packet will actually leave the system after the link is lost, but before the disabled message occurs. This disabled state I assumed was STP, but I have disabled STP on the bridge configuration and this issue still occurs. brctl showstp br1 still shows the link as disabled when it is running without a slave. I also switched between the nics in the server (I have 2x Broadcom & 4x intel). It doesn't matter which configuration I have. Does anyone know of a way to force the bridge to stay enabled or why its detecting the bond as disabled, when it isn't?

    Read the article

  • hyper-v cluster behavior when losing network connectivity

    - by ChristopheD
    Setup: (rather new) Hyper-V R2 cluster with 2 nodes (in failover configuration). Fysical host OS: Windows Server 2008. About eight VM's (mixed: Windows Server 2008 and Linux) Yesterday we had a power outage of about 15 minutes. Our blades are on UPS so the fysical host machines (Windows Server 2008) never went down. Our main switches are not on UPS (yet) and we saw the behaviour similar to the following (as distilled from the event logs). The nodes in the cluster lost means of communication (because the external switches went down). The cluster wants to bring down one (the first) of the nodes (to start failover?). The previous step impacts clustered storage where the virtual machine VHD's are located. All VM's got brutally terminated and were found in a failed state in the failover manager in the host OS'es. The Linux VM's were kernel panicking and looked like they had their disk ripped out. This whole setup is rather new to us, so we are still learning about this. The question: We are putting switches on UPS soon but were wondering if the above is expected behavior (seems rather fragile) or if there are obvious improvements configuration-wise to handle such scenario's ? I can upload an evtx file concerning what exactly was going on in case that's necessary.

    Read the article

  • High availability virtual machines

    - by Jeremy
    I've been reading a lot about high availability virtualization, either via Hyper-V or VMWare. In that context, essentially high availabliity means that the VM is hosted by a closter of physical servers (nodes), so if one of the physical servers goes down, the VM can still be served by other physical servers. So far so good, the physical cluster and the VM itself are highly available. However if the service being provided, let's say SQL server, MSDTC, or any other service, are actually being provided by the VM image and the virtualized operating system. So I imagine that there is still a point of failure at the virtual layer that isn't accounted for. Something could happen within the virtual machine itself that the physican cluster can not account for, correct? In that instance the physican failover cluster (Hyper-V) or VMWare host, can not fail over, because the issue is not with one of the servers in the physical cluster - failing over a physical node would not do any good. Does this necessitate building a virtual failover cluster on top of the physical one, or is this not necessary? Alternatively, I suppose you could skip the phsyical clustering, and just cluster at the virtual layer (Child based failover clustering), because that should still survive a physical failure. See image below showing parent based (left), child based (right) and a combination (center). Is parent based as far as you need to go, or is child based more appropriate?

    Read the article

< Previous Page | 5 6 7 8 9 10 11 12 13 14 15 16  | Next Page >