Search Results

Search found 1419 results on 57 pages for 'availability'.

Page 8/57 | < Previous Page | 4 5 6 7 8 9 10 11 12 13 14 15 | Next Page >

heartbeat: Bad nodename in /etc/ha.d//haresources [node1]

- by Richard

I'm trying to start heartbeat on Ubuntu 10.04 with service heartbeat start, but getting the following errors: heartbeat[24829]: 2011/11/22_19:31:07 ERROR: Bad nodename in /etc/ha.d//haresources [node1] heartbeat[24829]: 2011/11/22_19:31:07 ERROR: Configuration error, heartbeat not started. On on server uname -n produces loadb1, on the second server uname -n produces loadb2. The two servers can ping each other okay with those names. This is /etc/ha.d/ha.cnf on both servers: debugfile /var/log/ha-debug logfile /var/log/ha-log logfacility local0 keepalive 2 deadtime 10 udpport 694 bcast eth1 ucast eth0 my.external.ip ucast eth0 my.external.ip ucast eth1 10.0.0.5 ucast eth1 10.0.0.6 #udp eth0 node loadb1 node loadb2 auto_failback off And this is /etc/ha.d/haresources on both servers: node1 IPaddr::46.20.121.113 httpd smb dhcpd Authkeys is also set up. What am I doing wrong? The part where I'm least clear is the ucast/bcast lines.

Read the article
Exchange 2010 DAG Automatic Failover Testing/Issue. Not always automatically failing over to health

- by Richard

Ok I've got 2 exchange 2010 servers that run client access/hub transport/mailbox roles and one exchange 2010 server running just client access/hub transport roles and acts as my bridgehead. The two mailbox servers are running one database setup in a DAG. Server A shows the DB Mounted and Server B shows Healthy. If I reboot Server A via windows GUI Server B switches from healthy to mounted and I see hardly any interruption in service using Outlook 2007. Server A shows "Service down", then "Failed" then "Healthy" and leaves the DB mounted on Server B. This is how it should work, so far so good. Now if I test Server A being shut down cold, or unplugging both nics from network to simulate failure, Server B switches from Healthy to Mounted and server A switches to "Service Down" but my outlook client never connects to the DB mounted on server B! I can connect to server C (client access/hub transport) and get to my email and even send new email out, but incoming email doesn't deliver until Server A is brought back online and it's DB goes back to Healthy status. So I don't understand why it auto fail-overs when I reboot the server with the mounted DB copy, causing very little outlook 2007 hiccup if any. But when I shutdown or DC the mounted DB server it DOES mount the healthy copy but outlook 2007 clients can't connect.. I hope the picture I'm trying to paint makes some sense, it's driving me a little batty. Any help would be appreciated!

Read the article
Unexpected start of already-primary server processes when heartbeat on secondary is stopped.

- by vorik

Hi, I've got an active-passive Heartbeat cluster with Apache, MySQL, ActiveMQ and DRBD. Today, I wanted to perform hardware-maintenance on the secondary node (node04), so I stopped the heartbeat service before shutting it down. Then, the primary node (node03) received a shutdown notice from the secondary node (node04). This logging comes from the primary node: node03 heartbeat[4458]: 2010/03/08_08:52:56 info: Received shutdown notice from 'node04.companydomain.nl'. heartbeat[4458]: 2010/03/08_08:52:56 info: Resources being acquired from node04.companydomain.nl. harc[27522]: 2010/03/08_08:52:56 info: Running /etc/ha.d/rc.d/status status heartbeat[27523]: 2010/03/08_08:52:56 info: Local Resource acquisition completed. mach_down[27567]: 2010/03/08_08:52:56 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired mach_down[27567]: 2010/03/08_08:52:56 info: mach_down takeover complete for node node04.companydomain.nl. heartbeat[4458]: 2010/03/08_08:52:56 info: mach_down takeover complete. harc[27620]: 2010/03/08_08:52:56 info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp ip-request-resp[27620]: 2010/03/08_08:52:56 received ip-request-resp drbddisk OK yes ResourceManager[27645]: 2010/03/08_08:52:56 info: Acquiring resource group: node03.companydomain.nl drbddisk Filesystem::/dev/drbd0::/data::ext3 mysql apache::/etc/httpd/conf/httpd.conf LVSSyncDaemonSwap::master monitor activemq tivoli-cluster MailTo::[email protected]::DRBDFailureDrisAcc MailTo::[email protected]::DRBDFailureDrisAcc 1.2.3.212 ResourceManager[27645]: 2010/03/08_08:52:56 info: Running /etc/ha.d/resource.d/drbddisk start Filesystem[27700]: 2010/03/08_08:52:57 INFO: Running OK ResourceManager[27645]: 2010/03/08_08:52:57 info: Running /etc/ha.d/resource.d/mysql start mysql[27783]: 2010/03/08_08:52:57 Starting MySQL[ OK ] apache[27853]: 2010/03/08_08:52:57 INFO: Running OK ResourceManager[27645]: 2010/03/08_08:52:57 info: Running /etc/ha.d/resource.d/monitor start monitor[28160]: 2010/03/08_08:52:58 ResourceManager[27645]: 2010/03/08_08:52:58 info: Running /etc/ha.d/resource.d/activemq start activemq[28210]: 2010/03/08_08:52:58 Starting ActiveMQ Broker... ActiveMQ Broker is already running. ResourceManager[27645]: 2010/03/08_08:52:58 ERROR: Return code 1 from /etc/ha.d/resource.d/activemq ResourceManager[27645]: 2010/03/08_08:52:58 CRIT: Giving up resources due to failure of activemq ResourceManager[27645]: 2010/03/08_08:52:58 info: Releasing resource group: node03.companydomain.nl drbddisk Filesystem::/dev/drbd0::/data::ext3 mysql apache::/etc/httpd/conf/httpd.conf LVSSyncDaemonSwap::master monitor activemq tivoli-cluster MailTo::[email protected]::DRBDFailureDrisAcc MailTo::[email protected]::DRBDFailureDrisAcc 1.2.3.212 ResourceManager[27645]: 2010/03/08_08:52:58 info: Running /etc/ha.d/resource.d/IPaddr 1.2.3.212 stop IPaddr[28329]: 2010/03/08_08:52:58 INFO: ifconfig eth0:0 down IPaddr[28312]: 2010/03/08_08:52:58 INFO: Success ResourceManager[27645]: 2010/03/08_08:52:58 info: Running /etc/ha.d/resource.d/MailTo [email protected] DRBDFailureDrisAcc stop MailTo[28378]: 2010/03/08_08:52:58 INFO: Success ResourceManager[27645]: 2010/03/08_08:52:58 info: Running /etc/ha.d/resource.d/MailTo [email protected] DRBDFailureDrisAcc stop MailTo[28433]: 2010/03/08_08:52:58 INFO: Success ResourceManager[27645]: 2010/03/08_08:52:58 info: Running /etc/ha.d/resource.d/tivoli-cluster stop ResourceManager[27645]: 2010/03/08_08:52:58 info: Running /etc/ha.d/resource.d/activemq stop activemq[28503]: 2010/03/08_08:53:01 Stopping ActiveMQ Broker... Stopped ActiveMQ Broker. ResourceManager[27645]: 2010/03/08_08:53:01 info: Running /etc/ha.d/resource.d/monitor stop monitor[28681]: 2010/03/08_08:53:01 ResourceManager[27645]: 2010/03/08_08:53:01 info: Running /etc/ha.d/resource.d/LVSSyncDaemonSwap master stop LVSSyncDaemonSwap[28714]: 2010/03/08_08:53:02 info: ipvs_syncmaster down LVSSyncDaemonSwap[28714]: 2010/03/08_08:53:02 info: ipvs_syncbackup up LVSSyncDaemonSwap[28714]: 2010/03/08_08:53:02 info: ipvs_syncmaster released ResourceManager[27645]: 2010/03/08_08:53:02 info: Running /etc/ha.d/resource.d/apache /etc/httpd/conf/httpd.conf stop apache[28782]: 2010/03/08_08:53:03 INFO: Killing apache PID 18390 apache[28782]: 2010/03/08_08:53:03 INFO: apache stopped. apache[28771]: 2010/03/08_08:53:03 INFO: Success ResourceManager[27645]: 2010/03/08_08:53:03 info: Running /etc/ha.d/resource.d/mysql stop mysql[28851]: 2010/03/08_08:53:24 Shutting down MySQL.....................[ OK ] ResourceManager[27645]: 2010/03/08_08:53:24 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /data ext3 stop Filesystem[29010]: 2010/03/08_08:53:25 INFO: Running stop for /dev/drbd0 on /data Filesystem[29010]: 2010/03/08_08:53:25 INFO: Trying to unmount /data Filesystem[29010]: 2010/03/08_08:53:25 ERROR: Couldn't unmount /data; trying cleanup with SIGTERM Filesystem[29010]: 2010/03/08_08:53:25 INFO: Some processes on /data were signalled Filesystem[29010]: 2010/03/08_08:53:27 INFO: unmounted /data successfully Filesystem[28999]: 2010/03/08_08:53:27 INFO: Success ResourceManager[27645]: 2010/03/08_08:53:27 info: Running /etc/ha.d/resource.d/drbddisk stop heartbeat[4458]: 2010/03/08_08:53:29 WARN: node node04.companydomain.nl: is dead heartbeat[4458]: 2010/03/08_08:53:29 info: Dead node node04.companydomain.nl gave up resources. heartbeat[4458]: 2010/03/08_08:53:29 info: Link node04.companydomain.nl:eth0 dead. heartbeat[4458]: 2010/03/08_08:53:29 info: Link node04.companydomain.nl:eth1 dead. hb_standby[29193]: 2010/03/08_08:53:57 Going standby [foreign]. heartbeat[4458]: 2010/03/08_08:53:57 info: node03.companydomain.nl wants to go standby [foreign] Soo... What just happened here??? Heartbeat on node04 stopped and told node03, which was the active node at the time. Somehow, node03 decided to start the cluster processes that were already running. (For the processes that are not critical, I always return a 0 from the startupscript so it does not stops the entire cluster when a non-essential part fails.) When starting ActiveMQ, it returns status 1 because it is already running. This fails the node and shuts everything down. As heartbeat is not running on the secondary node, it cannot failover to there. When I tried to run ha_takeover to restart the resources, absolutely nothing happened. Only after I restarted heartbeat on the primary node the resources could be started (after a delay of 2 minutes). These are my questions: Why does heartbeat on the primary node try to start the cluster processes again? Why did ha_takeover not work? What can I do to prevent this from happening? Server configuration: DRBD: version: 8.3.7 (api:88/proto:86-91) GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by [email protected], 2010-01-20 09:14:48 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate B r---- ns:0 nr:6459432 dw:6459432 dr:0 al:0 bm:301 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0 uname -a Linux node04 2.6.18-164.11.1.el5 #1 SMP Wed Jan 6 13:26:04 EST 2010 x86_64 x86_64 x86_64 GNU/Linux haresources node03.companydomain.nl \ drbddisk \ Filesystem::/dev/drbd0::/data::ext3 \ mysql \ apache::/etc/httpd/conf/httpd.conf \ LVSSyncDaemonSwap::master \ monitor \ activemq \ tivoli-cluster \ MailTo::[email protected]::DRBDFailureDrisAcc \ MailTo::[email protected]::DRBDFailureDrisAcc \ 1.2.3.212 ha.cf debugfile /var/log/ha-debug logfile /var/log/ha-log keepalive 500ms deadtime 30 warntime 10 initdead 120 udpport 694 mcast eth0 225.0.0.3 694 1 0 mcast eth1 225.0.0.4 694 1 0 auto_failback off node node03.companydomain.nl node node04.companydomain.nl respawn hacluster /usr/lib64/heartbeat/dopd apiauth dopd gid=haclient uid=hacluster Thank you very much in advance, Ger Apeldoorn

Read the article
What is the difference between Anycast and GeoDNS / GeoIP wrt HA?

- by Riyad

Based on the Wikipedia description of Anycast, it includes both the distribution of a domain-name-to-many-IP-mapping across many DNS servers as well as replying to clients with the most geographically close (or fastest) server. In the context of a globally distributed, highly available site like google.com (or any CDN service with many global edge locations) this sounds like the two key features one would need. DNS services like Amazon's Route53, EasyDNS and DNSMadeEasy all advertise themselves as Anycast-enabled networks. Therefore my assumption is that each of these DNS services transparently offer me those two killer features: multi-IP-to-domain mapping AND routing clients to the closest node. However, each of these services seem to separate out these two functionalities, referring to the 2nd one (routing clients to closest node) as "GeoDNS", "GeoIP" or "Global Traffic Director" and charge extra for the service. If a core tenant of an Anycast-capable system is to already do this, why is this functionality being earmarked as this extra feature? What is this "GeoDNS" feature doing that a standard Anycast DNS service won't do (according to the definition of Anycast from Wikipedia -- I understand what is being advertised, just not why it isn't implied already). I get extra-confused when a DNS service like Route53 that doesn't support this nebulous "GeoDNS" feature lists functionality like: Fast – Using a global anycast network of DNS servers around the world, Route 53 is designed to automatically route your users to the optimal location depending on network conditions. As a result, the service offers low query latency for your end users, as well as low update latency for your DNS record management needs. ... which sounds exactly like what GeoDNS is intended to do, but geographically directing clients is something they explicitly don't support it yet. Ultimately I am looking for the two following features from a DNS provider: Map multiple IP addresses to a single domain name (like google.com, amazon.com, etc. does) Utilize a DNS service that will respond to client requests for that domain with the IP address of the nearest server to the requestee. As mentioned, it seems like this is all part of an "Anycast" DNS service (all of which these services are), but the features and marketing I see from them suggest otherwise, making me think I need to learn a bit more about how DNS works before making a deployment choice. Thanks in advance for any clarifications.

Read the article
Suggestions for programming language and database for a high end database querying system (>50 milli

- by mmdave

These requirements are sketchy at the moment, but will appreciate any insights. We are exploring what would be required to build a system that can handle 50 database million queries a day - specifiically from the programming language and database choice Its not a typical website, but an API / database accessing through the internet. Speed is critical. The application will primarily receive these inputs (about a few kb each) and will have to address each of them via the database lookup. Only a few kb will be returned. The server will be run over https/ssl.

Read the article
HAProxy -- pause/queue all traffic without losing requests

- by Marc

I basically have the same problem as mentioned in this thread -- I would like to temporarily suspend all requests to all servers of a certain backend, so that I can upgrade the backend and the database it uses. Since this is a live system, I would like to queue up requests, and send them to the backend servers once they've been upgraded. Since I'm doing a database upgrade with the code change, I have to upgrade all backend servers simultaneously, so I can't just bring one down at a time. I tried using the tcp-request options combined with removing the static healthcheck file as mentioned in that thread, but had no luck. Setting the default "maxconn" value to 0 seems to pause and queue connections as desired, but then there seems to be no way to increase the value back to a positive number without restarting HAProxy, which kills all requests that had been queued up until that point. (The "hot-reconfiguration" options using -sf and -st start a new process, which doesn't seem to do what I want). Is what I'm trying to do possible?

Read the article
NFS v4, HA Migration, and stale handles on clients

- by Karl Katzke

I'm managing a server running NFS v4 with Pacemaker/OpenAIS. NFS is configured to use TCP. When I migrate the NFS server to another node in the Pacemaker cluster, even though the metadata is persisted, connections from the clients 'hang' and eventually time out after 90 seconds. After that 90 seconds, the old mountpoint becomes 'stale' and the mounted files can no longer be accessed. The 90 second grace period seems to be part of the server configuration and not the client configuration. I see this message on the server: kernel: NFSD: starting 90-second grace period If I restart the NFS client on the client nodes after I migrate (unmounting and then remounting the share), then I don't experience the problem, but connections and file transfers still interrupted. Three questions: What is the 90 second grace period? What's it there for? How can I keep the files from going stale on the clients without restarting them after I migrate the NFS server to another node? Is it actually possible to migrate the NFS server without having large file uploads drop?

Read the article
Software for failover across multiple external hosts

- by Lin

I have multiple webservers with the same content, hosted across different providers. However, I can't seem to find a nice, simple failover solution. Load-balancing software (Pound, HAProxy, etc.) are unnecessary, and I need the flexibility to manage over 100+ domains, so the paid DNS failover solutions I've found are too expensive. So far the simplest solution I've thought of is just to set a very low TTL (30min - 1hr) in each zone entry on my nameservers (running BIND). Then, continuously monitor each server, and temporarily remove failed servers from zone entries. But this seems like something that should be currently available. I only have root access to different VPSes running CentOS. Any suggestions? Thanks!

Read the article
Biztalk 2009 logshipping with SQL 2008

- by Manjot

Hi, I am setting up biztalk logshipping for Biztalk 2009 database. Following http://msdn.microsoft.com/en-us/library/aa560961.aspx article, I am doing the following to setup biztalk logshipping on destination server: Enable Ad-hoc queries by: sp_configure 'show advanced options',1 go reconfigure go sp_configure 'Ad Hoc Distributed Queries',1 go reconfigure go sp_configure 'show advanced options',0 go reconfigure go Execute LogShipping_Destination_Schema & LogShipping_Destination_Logic in master on destinations server Run: exec bts_ConfigureBizTalkLogShipping @nvcDescription = '', @nvcMgmtDatabaseName = '', @nvcMgmtServerName = '', @SourceServerName = null, -- null indicates that this destination server restores all databases @fLinkServers = 1 -- 1 automatically links the server to the management database When I run this I am receiving the following error: Login failed for user 'NT AUTHORITY\ANONYMOUS LOGON'. After some research I found some info : Usually this error means that the SQL Server Service Principal Name (SPN) was not configured, and NTLM was not being used as an authentication mechanism. SQl services are runing under different domain accounts. So, I asked the domain admin to create SPNs for the servers, SQL service accounts for beoth source and destination using name and FQDN. enabled computer name and service accounts for delegation. When I run the following: select * from sys.dm_exec_connections I get the the same error: Login failed for user 'NT AUTHORITY\ANONYMOUS LOGON' Any help please?

Read the article
MySQL: Pacemaker cannot start the failed master as a new slave?

- by quanta

I'm going to setup failover for MySQL replication (1 master and 1 slave) follow this guide: https://github.com/jayjanssen/Percona-Pacemaker-Resource-Agents/blob/master/doc/PRM-setup-guide.rst Here're the output of crm configure show: node serving-6192 \ attributes p_mysql_mysql_master_IP="192.168.6.192" node svr184R-638.localdomain \ attributes p_mysql_mysql_master_IP="192.168.6.38" primitive p_mysql ocf:percona:mysql \ params config="/etc/my.cnf" pid="/var/run/mysqld/mysqld.pid" socket="/var/lib/mysql/mysql.sock" replication_user="repl" replication_passwd="x" test_user="test_user" test_passwd="x" \ op monitor interval="5s" role="Master" OCF_CHECK_LEVEL="1" \ op monitor interval="2s" role="Slave" timeout="30s" OCF_CHECK_LEVEL="1" \ op start interval="0" timeout="120s" \ op stop interval="0" timeout="120s" primitive writer_vip ocf:heartbeat:IPaddr2 \ params ip="192.168.6.8" cidr_netmask="32" \ op monitor interval="10s" \ meta is-managed="true" ms ms_MySQL p_mysql \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" globally-unique="false" target-role="Master" is-managed="true" colocation writer_vip_on_master inf: writer_vip ms_MySQL:Master order ms_MySQL_promote_before_vip inf: ms_MySQL:promote writer_vip:start property $id="cib-bootstrap-options" \ dc-version="1.0.12-unknown" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ no-quorum-policy="ignore" \ stonith-enabled="false" \ last-lrm-refresh="1341801689" property $id="mysql_replication" \ p_mysql_REPL_INFO="192.168.6.192|mysql-bin.000006|338" crm_mon: Last updated: Mon Jul 9 10:30:01 2012 Stack: openais Current DC: serving-6192 - partition with quorum Version: 1.0.12-unknown 2 Nodes configured, 2 expected votes 2 Resources configured. ============ Online: [ serving-6192 svr184R-638.localdomain ] Master/Slave Set: ms_MySQL Masters: [ serving-6192 ] Slaves: [ svr184R-638.localdomain ] writer_vip (ocf::heartbeat:IPaddr2): Started serving-6192 Editing /etc/my.cnf on the serving-6192 of wrong syntax to test failover and it's working fine: svr184R-638.localdomain being promoted to become the master writer_vip switch to svr184R-638.localdomain Current state: Last updated: Mon Jul 9 10:35:57 2012 Stack: openais Current DC: serving-6192 - partition with quorum Version: 1.0.12-unknown 2 Nodes configured, 2 expected votes 2 Resources configured. ============ Online: [ serving-6192 svr184R-638.localdomain ] Master/Slave Set: ms_MySQL Masters: [ svr184R-638.localdomain ] Stopped: [ p_mysql:0 ] writer_vip (ocf::heartbeat:IPaddr2): Started svr184R-638.localdomain Failed actions: p_mysql:0_monitor_5000 (node=serving-6192, call=15, rc=7, status=complete): not running p_mysql:0_demote_0 (node=serving-6192, call=22, rc=7, status=complete): not running p_mysql:0_start_0 (node=serving-6192, call=26, rc=-2, status=Timed Out): unknown exec error Remove the wrong syntax from /etc/my.cnf on serving-6192, and restart corosync, what I would like to see is serving-6192 was started as a new slave but it doesn't: Failed actions: p_mysql:0_start_0 (node=serving-6192, call=4, rc=1, status=complete): unknown error Here're snippet of the logs which I'm suspecting: Jul 09 10:46:32 serving-6192 lrmd: [7321]: info: rsc:p_mysql:0:4: start Jul 09 10:46:32 serving-6192 lrmd: [7321]: info: RA output: (p_mysql:0:start:stderr) Error performing operation: The object/attribute does not exist Jul 09 10:46:32 serving-6192 crm_attribute: [7420]: info: Invoked: /usr/sbin/crm_attribute -N serving-6192 -l reboot --name readable -v 0 The full logs: http://fpaste.org/AyOZ/ The strange thing is I can starting it manually: export OCF_ROOT=/usr/lib/ocf export OCF_RESKEY_config="/etc/my.cnf" export OCF_RESKEY_pid="/var/run/mysqld/mysqld.pid" export OCF_RESKEY_socket="/var/lib/mysql/mysql.sock" export OCF_RESKEY_replication_user="repl" export OCF_RESKEY_replication_passwd="x" export OCF_RESKEY_test_user="test_user" export OCF_RESKEY_test_passwd="x" sh -x /usr/lib/ocf/resource.d/percona/mysql start: http://fpaste.org/RVGh/ Did I make something wrong?

Read the article
What's the difference between keepalived and corosync, others?

- by Matt

I'm building a failover firewall for a server cluster and started looking at the various options. I'm more familiar with carp on freebsd, but need to use linux for this project. Searching google has produced several different projects, but no clear information about features they provide . CARP gave virtual interfaces that failover, I am not really clear on whether that's what corosync does, or is that what pacemaker does? On the other hand I did get manage to get keepalived working. However, I noted that corosync provides native support for infiniband. This would be useful for me. Perhaps someone could shed some light on the differences between: corosync keepalive pacemaker heartbeat Which product would be the best fit for router failover?

Read the article
Ideas for scaling out database architecture

- by andrew

We're looking to scale out our existing database architecture and need some advice on which way to go. We currently have 2 web servers behind a load balancer that both read & write to a single master database which replicates to a slave. Ideally, I'd like each of the webservers to point to their own master DB and have the data between the 2 synchronised but from what I've read, using any kind of master-master or ring-replication is discouraged. I'm looking for a general "what do other people do" kind of answer - database vendor isn't a concern at the moment but we'd like to stay with MySQL or convert to MSSQL. Any ideas would be gratefully received. Many thanks, Andrew

Read the article
Performance issue when configuring non HA VM in cluster

- by laiys

Hi, I saw this article http://technet.microsoft.com/en-us/library/cc764243.aspx Quote taken from the link “ Important It is recommended that you not deploy virtual machines that are not highly available on your host clusters. Although you can do this by using Hyper-V (VMM does not allow it), the non-highly available virtual machines will consume resources that otherwise would be available to the HAVMs What kind of resources (CPU,memory, NIC, etc) that non HA VM will consume? Just curious as not all VM (in production) not to be in Failover Cluster and Live Migration. If i put the VM into CSV but did not make it as HA, what impact does it make since i allocate same vCPU, vNic and VMemory into the VM. (not to mention that i lost failover feature). Curious to understand more about this. Please advise. Thanks

Read the article
All of ESX4 hosts not responding after rebooting vCenter server

- by hojyokinmo

Hello I'm constructing 5ESX+a vCenter serv. I'm testing FT on the system. Today vCenter OS (W2008) demanded system reboot. After rebooting vCenter Serv. All of ESX hosts have alert icon. end indicate "not responding" I tried to connect ESX (version 4) hosts to vCenter again. Once they came back normally, but after few seconds ,They turned red again. All of them have ping connection to vCenter serv. There are no red messages in Tusks and events. Two messages are found in Summary of Cluster ·There aren't sufficient resource for HA failover. ·Can't contact primary HA agent. I'm also not able to reset FT configuration because all of VMS aren't accessed from vCenter What should I do?

Read the article
How to handle server failure in an n-tier architecture?

- by andy

Imagine I have an n-tier architecture in an auto-scaled cloud environment with say: a load balancer in a failover pair reverse proxy tier web app tier db tier Each tier needs to connect to the instances in the tier below. What are the standard ways of connecting tiers to make them resilient to failure of nodes in each tier? i.e. how does each tier get the IP addresses of each node in the tier below? For example if all reverse proxies should route traffic to all web app nodes, how could they be set up so that they don't send traffic to dead web app nodes, and so that when new web app nodes are brought online they can send traffic to it? I could run an agent that would update all the configs to all the nodes, but it seems inefficient. I could put an LB pair between each tier, so the tier above only needs to connect to the load balancers, but how do I handle the problem of the LBs dying? This just seems to shunt the problem of tier A needing to know the IPs of all nodes in tier B, to all nodes in tier A needing to know the IPs of all LBs between tiers A and B. For some applications, they can implement retry logic if they contact a node in the tier below that doesn't respond, but is there any way that some middleware could direct traffic to only live nodes in the following tier? If I was hosting on AWS I could use an ELB between tiers, but I want to know how I could achieve the same functionality myself. I've read (briefly) about heartbeat and keepalived - are these relevant here? What are the virtual IPs they talk about and how are they managed? Are there still single points of failure using them?

Read the article
Amazon EC2 instance was not available for few minutes (amazon showed that everything ok)

- by Salvador Dali

Few minutes ago my amazon Ec2 instance was unavailable for a few minutes. During this time neither I was able to connect to web-site with http, nor I was able to ssh to it. Also I was not able to connect to my amazon management console for some time (less than amount of unavailability of my instance). When I was able to connect to management console, it was showing me that everything is running smoothly (but I still was not able to connect to instance in any way for a minute or two). During this time I have checked their status page just to see that there is no issues (my instance is in Ireland and there is nothing wrong there today). After that I was able to log in. I checked my logins with last to see that no one except me was logging in. I also looked in apache logs and there was no errors or warnings during this time. Right now when I see my amazon monitor, I see a small spike in CPU in last 15 minutes (but this is from 10% to like 20%) I have no idea what can it be (I have never experienced anything like this before) and therefore I have no idea how scared should I be or what else should I look for. Can anyone give me a hint what my actions should be in such situation?

Read the article
Whats the best way to setup an IIS7 Webfarm for an ASP.NET Application

- by sontek

We are looking to setup an IIS7 WebFarm... We have 2 IIS7/Windows Server 2008 boxes that will act as the load balanced webservers. How do you setup IIS/Windows Server 2008 to handle balancing the requests between the 2 servers? What is the best way to sync deployments so we only have to deploy to 1 place. Can we just have them sync their structures or do we have to use a NAS/Network Share?

Read the article
Configuring HAProxy with memcache with failover

- by Lawrie Matthews

I'm configuring a new set of servers for an existing Wordpress site, and it's been requested that memcache be available and made more resilient. The idea proposed is to have HAProxy send requests to one of the two servers; if that memcache instance is inaccessible, then it should switch to the second, but should not switch back to the first if it comes back up unless the second is then unavailable. This doesn't appear to be a particularly common use case and I've not found much along these lines except to possibly set up the first node with an enormous rise value, such as: server server1 10.112.58.16:11211 check inter 5s fall 3 rise 99999999 server server2 10.112.58.19:11211 check backup which falls over as expected when server1 is unavailable. It won't ever fall back to server1, though, even if server2 goes offline. Can this be made to work?

Read the article
Cross-platform distributed fault-tolerant (disconnected operation/local cache) filesystem

- by Adrian Frühwirth

We are facing a design "challenge" where we are required to set up a storage solution with the following properties: What we need HA a scalable storage backend offline/disconnected operation on the client to account for network outages cross-platform access client-side access from certainly Windows (probably XP upwards), possibly Linux backend integrates with AD/LDAP (permission management (user/group management, ...)) should work reasonably well over slow WAN-links Another problem is that we don't really know all possible use cases here, if people need to be able to have concurrent access to shared files or if they will only be accessing their own files, so a possible solution needs to account for concurrent access and how conflict management would look in this case from a user's point of view. This two years old blog posts sums up the impression that I have been getting during the last couple of days of research, that there are lots of current übercool projects implementing (non-Windows) clustered petabyte-capable blob-storage solutions but that there is none that supports disconnected operation nicely and natively, but I am hoping that we have missed an obvious solution. What we have tried OpenAFS We figured that we want a distributed network filesystem with a local cache and tested OpenAFS (which, as the only currently "stable" DFS supporting disconnected operation, seemed the way to go) for a week but there are several problems with it: it's a real pain to set up there are no official RHEL/CentOS packages the package of the current stable version 1.6.5.1 from elrepo randomly kernel panics on fresh installs, this is an absolute no-go Windows support (including the required Kerberos packages) is mystical. The current client for the 1.6 branch does not run on Windows 8, the current client for the 1.7 does but it just randomly crashes. After that experience we didn't even bother testing on XP and Windows 7. Suffice to say, we couldn't get it working and the whole setup has been so unstable and complicated to setup that it's just not an option for production. Samba + Unison Since OpenAFS was a complete disaster and no other DFS seems to support disconnected operation we went for a simpler idea that would sync files against a Samba server using Unison. This has the following advantages: Samba integrates with ADs; it's a pain but can be done. Samba solves the problem of remotely accessing the storage from Windows but introduces another SPOF and does not address the actual storage problem. We could probably stick any clustered FS underneath Samba, but that means we need a HA Samba setup on top of that to maintain HA which probably adds a lot of additional complexity. I vaguely remember trying to implement redundancy with Samba before and I could not silently failover between servers. Even when online, you are working with local files which will result in more conflicts than would be necessary if a local cache were only touched when disconnected It's not automatic. We cannot expect users to manually sync their files using the (functional, but not-so-pretty) GTK GUI on a regular basis. I attempted to semi-automate the process using the Windows task scheduler, but you cannot really do it in a satisfactory way. On top of that, the way Unison works makes syncing against Samba a costly operation, so I am afraid that it just doesn't scale very well or even at all. Samba + "Offline Files" After that we became a little desparate and gave Windows "offline files" a chance. We figured that having something that is inbuilt into the OS would reduce administrative efforts, helps blaming someone else when it's not working properly and should just work since people have been using this for years. Right? Wrong. We really wanted it to work, but it just doesn't. 30 minutes of copying files around and unplugging network cables/disabling network interfaces left us with (silent! there is only a tiny notification in Windows explorer in the statusbar, which doesn't even open Sync Center if you click on it!) undeletable files on the server (!) and conflicts that should not even be conflicts. In the end, we had one successful sync of a tiny text file, everything else just exploded horribly. Beyond that, there are other problems: Microsoft admits that "offline files" in Windows XP cannot cope with "large files" and therefore does not cache/sync them at all which would mean those files become unavailable if the connection drop In Windows 7 the feature is only available in the Professional/Ultimate/Enterprise editions. Summary Unless there is another fault-tolerant DFS that supports Windows natively I assume that stacking a HA Samba cluster on top of something like GlusterFS/Lustre/whatnot is the only option, but I hope that I am wrong here. How do other companies allow fault-tolerant network access to redundant storage in a heterogeneous environment with Windows?

Read the article
RabbitMQ - How I do configure servers for zero-downtime upgrades?

- by Terence Johnson

Having read through the docs and RabbitMQ in Action, creating a RabbitMQ cluster seems straightforward enough, but upgrading or patching an existing RabbitMQ cluster seems to require the whole cluster to be restarted. Is there a way to combine clustering, shovel, federation, and load balancing to make a rolling upgrade possible without losing queues or messages, or have I missed something slightly more obvious?

Read the article
unable to destroy windows 2008 r2 failover cluster after SAN rebuild

- by Zack

I created a windows 2008 r2 failover cluster for a sql 2008 active/passive cluster. This two node cluster was using a SAN device for a quorum disk resource as well as MSDTC resource. Well....I decided to reconfigure the SAN device, but I didn't destroy the cluster first. Now that the quorum disk and mstdc disk are completely gone, the cluster is obviously not working. But, I can't even destroy the cluster and start again. I've tried from the Windows Clustering tool, as well as the command line. I was able to get the cluster service to start using the "/fixquorum" parameter. After doing this I was able to remove the passive node from the cluster, but it wouldn't let me destroy the cluster because the default resource group and msdtc are still attached as resources. I tried to delete these resources from both the GUI tool, as well as command line. It will either freeze for several minutes and crash the program, or once it even BSOD'd the server. Can someone advise on how to destroy this cluster so I can start over?

Read the article
How to setup heartbeat for IP fail over on SSH failure

- by Tony

I wonder if anyone can help me, I am trying to setup heartbeat on a redhat 5 to failover an IP address when ssh stops responding on a server. So basically you ssh to a VIP and then get put through which ever server has the floating ip. 192.168.0.100 | | /------------------------\ | /------------------------\ | Server 01 | | | Server 02 | | eth0 - 192.168.0.1 |-----/ | eth0 - 192.168.0.2 | | eth0:0 - 192.168.0.100 | | eth0:0 - down | \------------------------/ \------------------------/ if ssh stops responding i want eth0:0 to be brought up on the second machine to allow ssh connections to carry on being served. I have tried to follow some documents I have found online so here is my current configuration: ha.cf bcast eth0 keepalive 2 warntime 10 deadtime 30 initdead 120 udpport 694 auto_failback off node vm-bal01 node vm-bal02 debugfile /var/log/ha-debug logfile /var/log/ha-log authkeys auth 1 1 sha1 sshhhsecret1234 haresources server01 192.168.0.100/24/eth0:0/192.168.0.255 Hope someone can help as this is driving me nuts...

Read the article
reaching 99.9999% uptime

- by user35204

I am currently developing a project which is mission-critical. The actual domain name is registered with 1 & 1 and I plan on purchasing DynDNS Custom DNS service (which has 5 different geographical locations for DNS) and then another secondary DNS service to make sure my DNS is as failover safe as possible. Does it matter that the registration is with 1 & 1 - are they a weak link in the chain? All I really use them for is to say that DynDNS is my primary DNS nameserver and then my secondary DNS is my other nameserver. I can transfer the registration to DynDNS - Im just not sure if it really matters or not. Thanks

Read the article
HAProxy MySQL Failover is not starting

- by thiesdiggity

I am trying to setup HAProxy with MySQL failover with Ubuntu. I used a setup similar to this serverfault question, however I am getting the following error when starting haproxy: [ALERT] 341/220001 (17405) : parsing [/etc/haproxy/haproxy.cfg:29] : unknown option 'mysql-check'. [ALERT] 341/220001 (17405) : Error(s) found in configuration file : /etc/haproxy/haproxy.cfg [ALERT] 341/220001 (17405) : Fatal errors found in configuration. I even tried installing the lastest version of HAProxy (1.4.22). Does anyone know how to fix this? I have Google'd the heck out of it and can't find any solution. Thanks for your time!

Read the article
Drbd Primary/Primary + iSCSI: accessing to different files avoids split brain?

- by Eddie C.

I have a question / curiosity about split-brain on a Drbd Primary/Primary configuration. Supposing two nodes (hosts), host1 and host2 configured with Drbd Primary/Primary and two different shares (NFS, CIFS o iSCSI) of a replicated area (saying /drbd) /drbd/file1.data /drbd/file2.data If a pool of client would access only by host1 share reading and wrinting only file1.data and another pool only by host2 share to file2.data, this scenario should avoid split brain situation in case of one node failure or it's just a conjecture? The final purpose is load balance between the two nodes in normal condition and collapsing to one node only in case of failure. Thank you! Eddie

Read the article

Search Results

Search found 1419 results on 57 pages for 'availability'.

Page 8/57 | < Previous Page | 4 5 6 7 8 9 10 11 12 13 14 15 | Next Page >

- by Richard

- by Richard

- by vorik

- by Riyad

- by mmdave

- by Marc

- by Karl Katzke

- by Lin

- by Manjot

- by quanta

- by Matt

- by andrew

- by laiys

- by hojyokinmo

- by andy

- by Salvador Dali

- by sontek

- by Lawrie Matthews

- by Adrian Frühwirth

- by Terence Johnson

- by Zack

- by Tony

- by user35204

- by thiesdiggity

- by Eddie C.

< Previous Page | 4 5 6 7 8 9 10 11 12 13 14 15 | Next Page >