Search Results

Search found 7697 results on 308 pages for 'high iowait'.

Page 28/308 | < Previous Page | 24 25 26 27 28 29 30 31 32 33 34 35 | Next Page >

How to handle server failure in an n-tier architecture?

- by andy

Imagine I have an n-tier architecture in an auto-scaled cloud environment with say: a load balancer in a failover pair reverse proxy tier web app tier db tier Each tier needs to connect to the instances in the tier below. What are the standard ways of connecting tiers to make them resilient to failure of nodes in each tier? i.e. how does each tier get the IP addresses of each node in the tier below? For example if all reverse proxies should route traffic to all web app nodes, how could they be set up so that they don't send traffic to dead web app nodes, and so that when new web app nodes are brought online they can send traffic to it? I could run an agent that would update all the configs to all the nodes, but it seems inefficient. I could put an LB pair between each tier, so the tier above only needs to connect to the load balancers, but how do I handle the problem of the LBs dying? This just seems to shunt the problem of tier A needing to know the IPs of all nodes in tier B, to all nodes in tier A needing to know the IPs of all LBs between tiers A and B. For some applications, they can implement retry logic if they contact a node in the tier below that doesn't respond, but is there any way that some middleware could direct traffic to only live nodes in the following tier? If I was hosting on AWS I could use an ELB between tiers, but I want to know how I could achieve the same functionality myself. I've read (briefly) about heartbeat and keepalived - are these relevant here? What are the virtual IPs they talk about and how are they managed? Are there still single points of failure using them?

Read the article
Whats the best way to setup an IIS7 Webfarm for an ASP.NET Application

- by sontek

We are looking to setup an IIS7 WebFarm... We have 2 IIS7/Windows Server 2008 boxes that will act as the load balanced webservers. How do you setup IIS/Windows Server 2008 to handle balancing the requests between the 2 servers? What is the best way to sync deployments so we only have to deploy to 1 place. Can we just have them sync their structures or do we have to use a NAS/Network Share?

Read the article
1080p playback on external monitor

- by xibalban

My system (netbook) specs: 1.6 GHz dual core atom N2600 2 GB DDR3 RAM Integrated GMA3600 GPU Win 7 Professional 32 Bit Daum Pot Player V-1.5 The issue: I downloaded a 1.33 GB sized 1920 x 1080 HD documentary (file extension: mp4) and it plays without a hitch using pot player. However, when I connected an external display (an LED Full HD 24" monitor) using D-Sub, the video plays fine but the player controls become very unresponsive. For instance, it takes about 40 seconds to exit full screen while playing. I tried: Installed/updated all the latest monitor drivers Updated the VGA drivers for GMA3600 No other programs ran while the video played, with most unwanted services and background processes disabled/killed. CPU usage goes only up to 49% when the video plays Question: What may I do in order to improve the player response/video playback experience? Do I need to play with (or change) Pot Player's default video decoders, etc?

Read the article
Configuring HAProxy with memcache with failover

- by Lawrie Matthews

I'm configuring a new set of servers for an existing Wordpress site, and it's been requested that memcache be available and made more resilient. The idea proposed is to have HAProxy send requests to one of the two servers; if that memcache instance is inaccessible, then it should switch to the second, but should not switch back to the first if it comes back up unless the second is then unavailable. This doesn't appear to be a particularly common use case and I've not found much along these lines except to possibly set up the first node with an enormous rise value, such as: server server1 10.112.58.16:11211 check inter 5s fall 3 rise 99999999 server server2 10.112.58.19:11211 check backup which falls over as expected when server1 is unavailable. It won't ever fall back to server1, though, even if server2 goes offline. Can this be made to work?

Read the article
Avoiding DNS timeouts when a dns server fails

- by Neil Katin

We have a small datacenter with about a hundred hosts pointing to 3 internal dns servers (bind 9). Our problem comes when one of the internal dns servers becomes unavailable. At that point all the clients that point to that server start performing very slowly. The problem seems to be that the stock linux resolver doesn't really have the concept of "failing over" to a different dns server. You can adjust the timeout and number of retries it uses, (and set rotate so it will work through the list), but no matter what settings one uses our services perform much more slowly if a primary dns server becomes unavailable. At the moment this is one of the largest sources of service disruptions for us. My ideal answer would be something like "RTFM: tweak /etc/resolv.conf like this...", but if that's an option I haven't seen it. I was wondering how other folks handled this issue? I can see 3 possible types of solutions: Use linux-ha/Pacemaker and failover ips (so the dns IP VIPs are "always" available). Alas, we don't have a good fencing infrastructure, and without fencing pacemaker doesn't work very well (in my experience Pacemaker lowers availability without fencing). Run a local dns server on each node, and have resolv.conf point to localhost. This would work, but it would give us a lot more services to monitor and manage. Run a local cache on each node. Folks seem to consider nscd "broken", but dnrd seems to have the right feature set: it marks dns servers as up or down, and won't use 'down' dns servers. Any-casting seems to work only at the ip routing level, and depends on route updates for server failure. Multi-casting seemed like it would be a perfect answer, but bind does not support broadcasting or multi-casting, and the docs I could find seem to suggest that multicast dns is more aimed at service discovery and auto-configuration rather than regular dns resolving. Am I missing an obvious solution?

Read the article
Cross-platform distributed fault-tolerant (disconnected operation/local cache) filesystem

- by Adrian Frühwirth

We are facing a design "challenge" where we are required to set up a storage solution with the following properties: What we need HA a scalable storage backend offline/disconnected operation on the client to account for network outages cross-platform access client-side access from certainly Windows (probably XP upwards), possibly Linux backend integrates with AD/LDAP (permission management (user/group management, ...)) should work reasonably well over slow WAN-links Another problem is that we don't really know all possible use cases here, if people need to be able to have concurrent access to shared files or if they will only be accessing their own files, so a possible solution needs to account for concurrent access and how conflict management would look in this case from a user's point of view. This two years old blog posts sums up the impression that I have been getting during the last couple of days of research, that there are lots of current übercool projects implementing (non-Windows) clustered petabyte-capable blob-storage solutions but that there is none that supports disconnected operation nicely and natively, but I am hoping that we have missed an obvious solution. What we have tried OpenAFS We figured that we want a distributed network filesystem with a local cache and tested OpenAFS (which, as the only currently "stable" DFS supporting disconnected operation, seemed the way to go) for a week but there are several problems with it: it's a real pain to set up there are no official RHEL/CentOS packages the package of the current stable version 1.6.5.1 from elrepo randomly kernel panics on fresh installs, this is an absolute no-go Windows support (including the required Kerberos packages) is mystical. The current client for the 1.6 branch does not run on Windows 8, the current client for the 1.7 does but it just randomly crashes. After that experience we didn't even bother testing on XP and Windows 7. Suffice to say, we couldn't get it working and the whole setup has been so unstable and complicated to setup that it's just not an option for production. Samba + Unison Since OpenAFS was a complete disaster and no other DFS seems to support disconnected operation we went for a simpler idea that would sync files against a Samba server using Unison. This has the following advantages: Samba integrates with ADs; it's a pain but can be done. Samba solves the problem of remotely accessing the storage from Windows but introduces another SPOF and does not address the actual storage problem. We could probably stick any clustered FS underneath Samba, but that means we need a HA Samba setup on top of that to maintain HA which probably adds a lot of additional complexity. I vaguely remember trying to implement redundancy with Samba before and I could not silently failover between servers. Even when online, you are working with local files which will result in more conflicts than would be necessary if a local cache were only touched when disconnected It's not automatic. We cannot expect users to manually sync their files using the (functional, but not-so-pretty) GTK GUI on a regular basis. I attempted to semi-automate the process using the Windows task scheduler, but you cannot really do it in a satisfactory way. On top of that, the way Unison works makes syncing against Samba a costly operation, so I am afraid that it just doesn't scale very well or even at all. Samba + "Offline Files" After that we became a little desparate and gave Windows "offline files" a chance. We figured that having something that is inbuilt into the OS would reduce administrative efforts, helps blaming someone else when it's not working properly and should just work since people have been using this for years. Right? Wrong. We really wanted it to work, but it just doesn't. 30 minutes of copying files around and unplugging network cables/disabling network interfaces left us with (silent! there is only a tiny notification in Windows explorer in the statusbar, which doesn't even open Sync Center if you click on it!) undeletable files on the server (!) and conflicts that should not even be conflicts. In the end, we had one successful sync of a tiny text file, everything else just exploded horribly. Beyond that, there are other problems: Microsoft admits that "offline files" in Windows XP cannot cope with "large files" and therefore does not cache/sync them at all which would mean those files become unavailable if the connection drop In Windows 7 the feature is only available in the Professional/Ultimate/Enterprise editions. Summary Unless there is another fault-tolerant DFS that supports Windows natively I assume that stacking a HA Samba cluster on top of something like GlusterFS/Lustre/whatnot is the only option, but I hope that I am wrong here. How do other companies allow fault-tolerant network access to redundant storage in a heterogeneous environment with Windows?

Read the article
Flash player, HD videos and games are choppy

- by Aimad Majdou

I have a problem with flash player. HD videos from Youtube or Vimeo and flash games do not play smoothly. I'm using Flash player 11, Windows 7 Sp1, and my graphic card is Intel GMA 4500. Device Manager shows me that all drivers are installed on my computer, so i don't have any problems with drivers. When I run Google chrome, Resource Monitor shows me 15% ~ 40 % of CPU Usage and 40% used Physical Memory, but when I watch a video on Youtube or play a Flash game, the Resource Monitor shows 70% - 90% CPU Usage. Also, when I run some HD Video (Frame width : 1920, Frame height : 1080) on my computer, Device Manager shows me 80% ~ 100% of CPU Usage. before I Reinstall Windows 7, HD Videos and flash games were play smoothly I hadn't any problem with them !! I hope all these informations are enough to answer my question.

Read the article
How do I replace the screen of a Dell Inspiron 1545?

- by Ajus10

I got a new screen for a Dell Inspiron 1545. The old screen says Dell Inspiron 1545 LP156WH1 (TL)(C1?) HD and the new one says Dell Inspiron 1545 LP156WH1 (TL)(C1?) LCD Does that make a difference? All I can get to work on the new screen is the backlight. The old screen had a crack. Now when I plug the old one in, it will not turn on at all. Could I have blown the inverter or messed up the cable?

Read the article
unable to destroy windows 2008 r2 failover cluster after SAN rebuild

- by Zack

I created a windows 2008 r2 failover cluster for a sql 2008 active/passive cluster. This two node cluster was using a SAN device for a quorum disk resource as well as MSDTC resource. Well....I decided to reconfigure the SAN device, but I didn't destroy the cluster first. Now that the quorum disk and mstdc disk are completely gone, the cluster is obviously not working. But, I can't even destroy the cluster and start again. I've tried from the Windows Clustering tool, as well as the command line. I was able to get the cluster service to start using the "/fixquorum" parameter. After doing this I was able to remove the passive node from the cluster, but it wouldn't let me destroy the cluster because the default resource group and msdtc are still attached as resources. I tried to delete these resources from both the GUI tool, as well as command line. It will either freeze for several minutes and crash the program, or once it even BSOD'd the server. Can someone advise on how to destroy this cluster so I can start over?

Read the article
RabbitMQ - How I do configure servers for zero-downtime upgrades?

- by Terence Johnson

Having read through the docs and RabbitMQ in Action, creating a RabbitMQ cluster seems straightforward enough, but upgrading or patching an existing RabbitMQ cluster seems to require the whole cluster to be restarted. Is there a way to combine clustering, shovel, federation, and load balancing to make a rolling upgrade possible without losing queues or messages, or have I missed something slightly more obvious?

Read the article
How to setup heartbeat for IP fail over on SSH failure

- by Tony

I wonder if anyone can help me, I am trying to setup heartbeat on a redhat 5 to failover an IP address when ssh stops responding on a server. So basically you ssh to a VIP and then get put through which ever server has the floating ip. 192.168.0.100 | | /------------------------\ | /------------------------\ | Server 01 | | | Server 02 | | eth0 - 192.168.0.1 |-----/ | eth0 - 192.168.0.2 | | eth0:0 - 192.168.0.100 | | eth0:0 - down | \------------------------/ \------------------------/ if ssh stops responding i want eth0:0 to be brought up on the second machine to allow ssh connections to carry on being served. I have tried to follow some documents I have found online so here is my current configuration: ha.cf bcast eth0 keepalive 2 warntime 10 deadtime 30 initdead 120 udpport 694 auto_failback off node vm-bal01 node vm-bal02 debugfile /var/log/ha-debug logfile /var/log/ha-log authkeys auth 1 1 sha1 sshhhsecret1234 haresources server01 192.168.0.100/24/eth0:0/192.168.0.255 Hope someone can help as this is driving me nuts...

Read the article
HAProxy MySQL Failover is not starting

- by thiesdiggity

I am trying to setup HAProxy with MySQL failover with Ubuntu. I used a setup similar to this serverfault question, however I am getting the following error when starting haproxy: [ALERT] 341/220001 (17405) : parsing [/etc/haproxy/haproxy.cfg:29] : unknown option 'mysql-check'. [ALERT] 341/220001 (17405) : Error(s) found in configuration file : /etc/haproxy/haproxy.cfg [ALERT] 341/220001 (17405) : Fatal errors found in configuration. I even tried installing the lastest version of HAProxy (1.4.22). Does anyone know how to fix this? I have Google'd the heck out of it and can't find any solution. Thanks for your time!

Read the article
Drbd Primary/Primary + iSCSI: accessing to different files avoids split brain?

- by Eddie C.

I have a question / curiosity about split-brain on a Drbd Primary/Primary configuration. Supposing two nodes (hosts), host1 and host2 configured with Drbd Primary/Primary and two different shares (NFS, CIFS o iSCSI) of a replicated area (saying /drbd) /drbd/file1.data /drbd/file2.data If a pool of client would access only by host1 share reading and wrinting only file1.data and another pool only by host2 share to file2.data, this scenario should avoid split brain situation in case of one node failure or it's just a conjecture? The final purpose is load balance between the two nodes in normal condition and collapsing to one node only in case of failure. Thank you! Eddie

Read the article
Stability, x86 Vs Sparc

- by Jason T

Our project are plan to migrate from Sparc to x86, and our HA requirement is 99.99%, previous on Sparc, we assume the hardware stability would like, hardware failure every 4 month or even one year, and also we have test data for our application, then we have requirement for each unplanned recovery (fail over) to achieve 99.99% (52.6 minutes unplanned downtime per year). But since we are going to use Intel x86, it seems the hardware stability is not so good as Sparc, but we don't have the detail data. So compare with Sparc, how about the stability of the Intel x86, should we assume we have more unplanned downtime? If so, how many, double? Where I can find some more detail of this two type of hardware?

Read the article
Optimize php-fpm and varnish for a powerfull server

- by Jim

My setup is: Intel® Core™ i7-2600 and RAM 16 GB DDR3 RAM varnish+nginx+php-fpm+apc for a not very heavy WordPress blog with W3 Total Cache and CDN My problem is that after 55 hits per second according to blitz.io varnish starts giving out timeouts. CPU usage at this time is hardly 1%. Free memory at all time remains 10GB+. I tried benchmarking php-fpm directly with result of 150hits/s without any timeouts. But after that the CPU usage goes 100% and it stops responding. Can you help me optimize it to handle more? As i understand nginx has nothing to do over here so i dont put its config. php-fpm config listen = /tmp/php5-fpm.sock listen.allowed_clients = 127.0.0.1 user = nginx group = nginx pm = dynamic pm.max_children = 150 pm.start_servers = 7 pm.min_spare_servers = 2 pm.max_spare_servers = 15 pm.max_requests = 500 slowlog = /var/log/php-fpm/www-slow.log php_admin_value[error_log] = /var/log/php-fpm/www-error.log php_admin_flag[log_errors] = on apc extension = apc.so apc.enabled=1 apc.shm_size=512MB apc.num_files_hint=0 apc.user_entries_hint=0 apc.ttl=7200 apc.use_request_time=1 apc.user_ttl=7200 apc.gc_ttl=3600 apc.cache_by_default=1 apc.filters apc.mmap_file_mask=/tmp/apc.XXXXXX apc.file_update_protection=2 apc.enable_cli=0 apc.max_file_size=1M apc.stat=1 apc.stat_ctime=0 apc.canonicalize=0 apc.write_lock=1 apc.report_autofilter=0 apc.rfc1867=0 apc.rfc1867_prefix =upload_ apc.rfc1867_name=APC_UPLOAD_PROGRESS apc.rfc1867_freq=0 apc.rfc1867_ttl=3600 apc.include_once_override=0 apc.lazy_classes=0 apc.lazy_functions=0 apc.coredump_unmap=0 apc.file_md5=0 apc.preload_path Varnish VCL backend default { .host = "127.0.0.1"; .port = "8080"; .connect_timeout = 6s; .first_byte_timeout = 6s; .between_bytes_timeout = 60s; } acl purgehosts { "localhost"; "127.0.0.1"; } # Called after a document has been successfully retrieved from the backend. sub vcl_fetch { # Uncomment to make the default cache "time to live" is 5 minutes, handy # but it may cache stale pages unless purged. (TODO) # By default Varnish will use the headers sent to it by Apache (the backend server) # to figure out the correct TTL. # WP Super Cache sends a TTL of 3 seconds, set in wp-content/cache/.htaccess set beresp.ttl = 24h; # Strip cookies for static files and set a long cache expiry time. if (req.url ~ "\.(jpg|jpeg|gif|png|ico|css|zip|tgz|gz|rar|bz2|pdf|txt|tar|wav|bmp|rtf|js|flv|swf|html|htm)$") { unset beresp.http.set-cookie; set beresp.ttl = 24h; } # If WordPress cookies found then page is not cacheable if (req.http.Cookie ~"(wp-postpass|wordpress_logged_in|comment_author_)") { # set beresp.cacheable = false;#versions less than 3 #beresp.ttl>0 is cacheable so 0 will not be cached set beresp.ttl = 0s; } else { #set beresp.cacheable = true; set beresp.ttl=24h;#cache for 24hrs } # Varnish determined the object was not cacheable #if ttl is not > 0 seconds then it is cachebale if (!beresp.ttl > 0s) { # set beresp.http.X-Cacheable = "NO:Not Cacheable"; } else if ( req.http.Cookie ~"(wp-postpass|wordpress_logged_in|comment_author_)" ) { # You don't wish to cache content for logged in users set beresp.http.X-Cacheable = "NO:Got Session"; return(hit_for_pass); #previously just pass but changed in v3+ } else if ( beresp.http.Cache-Control ~ "private") { # You are respecting the Cache-Control=private header from the backend set beresp.http.X-Cacheable = "NO:Cache-Control=private"; return(hit_for_pass); } else if ( beresp.ttl < 1s ) { # You are extending the lifetime of the object artificially set beresp.ttl = 300s; set beresp.grace = 300s; set beresp.http.X-Cacheable = "YES:Forced"; } else { # Varnish determined the object was cacheable set beresp.http.X-Cacheable = "YES"; if (beresp.status == 404 || beresp.status >= 500) { set beresp.ttl = 0s; } # Deliver the content return(deliver); } sub vcl_hash { # Each cached page has to be identified by a key that unlocks it. # Add the browser cookie only if a WordPress cookie found. if ( req.http.Cookie ~"(wp-postpass|wordpress_logged_in|comment_author_)" ) { #set req.hash += req.http.Cookie; hash_data(req.http.Cookie); } } # vcl_recv is called whenever a request is received sub vcl_recv { # remove ?ver=xxxxx strings from urls so css and js files are cached. # Watch out when upgrading WordPress, need to restart Varnish or flush cache. set req.url = regsub(req.url, "\?ver=.*$", ""); # Remove "replytocom" from requests to make caching better. set req.url = regsub(req.url, "\?replytocom=.*$", ""); remove req.http.X-Forwarded-For; set req.http.X-Forwarded-For = client.ip; # Exclude this site because it breaks if cached if ( req.http.host == "sr.ituts.gr" ) { return( pass ); } # Serve objects up to 2 minutes past their expiry if the backend is slow to respond. set req.grace = 120s; # Strip cookies for static files: if (req.url ~ "\.(jpg|jpeg|gif|png|ico|css|zip|tgz|gz|rar|bz2|pdf|txt|tar|wav|bmp|rtf|js|flv|swf|html|htm)$") { unset req.http.Cookie; return(lookup); } # Remove has_js and Google Analytics __* cookies. set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s*)(__[a-z]+|has_js)=[^;]*", ""); # Remove a ";" prefix, if present. set req.http.Cookie = regsub(req.http.Cookie, "^;\s*", ""); # Remove empty cookies. if (req.http.Cookie ~ "^\s*$") { unset req.http.Cookie; } if (req.request == "PURGE") { if (!client.ip ~ purgehosts) { error 405 "Not allowed."; } #previous version ban() was purge() ban("req.url ~ " + req.url + " && req.http.host == " + req.http.host); error 200 "Purged."; } # Pass anything other than GET and HEAD directly. if (req.request != "GET" && req.request != "HEAD") { return( pass ); } /* We only deal with GET and HEAD by default */ # remove cookies for comments cookie to make caching better. set req.http.cookie = regsub(req.http.cookie, "1231111111111111122222222333333=[^;]+(; )?", ""); # never cache the admin pages, or the server-status page, or your feed? you may want to..i don't if (req.request == "GET" && (req.url ~ "(wp-admin|bb-admin|server-status|feed)")) { return(pipe); } # don't cache authenticated sessions if (req.http.Cookie && req.http.Cookie ~ "(wordpress_|PHPSESSID)") { return(lookup); } # don't cache ajax requests if(req.http.X-Requested-With == "XMLHttpRequest" || req.url ~ "nocache" || req.url ~ "(control.php|wp-comments-post.php|wp-login.php|bb-login.php|bb-reset-password.php|register.php)") { return (pass); } return( lookup ); } Varnish Daemon options DAEMON_OPTS="-a :80 \ -T 127.0.0.1:6082 \ -f /etc/varnish/ituts.vcl \ -u varnish -g varnish \ -S /etc/varnish/secret \ -p thread_pool_add_delay=2 \ -p thread_pools=8 \ -p thread_pool_min=100 \ -p thread_pool_max=1000 \ -p session_linger=50 \ -p session_max=150000 \ -p sess_workspace=262144 \ -s malloc,5G" Im not sure where to start, should i for start optimize php-fpm and then go to varnish or php-fpm is at its max right now so i should start looking for the problem in varnish?

Read the article
HAProxy NGInx SSL setup

- by Niclas

I've been looking around different setups for a server cluster supporting SSL and I would like to benchmark my idea with you. Requirements: All servers in the cluster should be under the same full domain name. (http and https) Routing to subsystems is done on URI matching in HA proxy. All URIs have support for SSL support. Wish: Centralizing routing rules ---<----http-----<-- | | Inet -->HA--+---https--->NGInx_SSL_1..N | | +---http---> Apache_1..M | +---http---> NodeJS Idea: Configure HA to route all SSL traffic (mode=tcp,algorithm=Source) to an NGInx cluster turning https traffic into http. Re-pass the http traffic from NGInx to the HA for normal load-balancing which performs load balancing based on HA config. My question is simply: Is this the best way to to configure based on requirements above?

Read the article
Server and Application architecture for large outgoing email volume.

- by Ezequiel

Hi, we need to develop an application to send large amount of emails (newsletters) We estimate 15 millions of emails per month (6 - 10 emails per seconds). Would you recommend me the proper architecture for this application? should we have several MTA agents and use them in a round robin fashion? What considerations should we take on account to not being treated as spammers (its really not spam what we are going to send). Thanks for your help. Ezequiel

Read the article
Free DNS software with failover support?

- by Lin

I'm looking for DNS software that can accomplish the following: Check health of all A records at set intervals If server is unresponsive after multiple successive checks, replace A record with a working server When a server is down, check it periodically. Once it's up, restore normal A records Here's an equivalent I thought of: Run DNS servers with very low TTL (minutes) Use a cron job to periodically query all webservers Use sed to replace A records if need be, and then restart DNS server I have a hard time believing there isn't already something that can accomplish the above. I'm not looking for a paid service, and I'm restricted to anything I can run with root access to a VPS. Any suggestions would be great. Thanks!

Read the article
master-slave datastore replication, automatic failover, and wackamole

- by z8000

I have 2 dedicated servers provisioned for my next project's datastores. The datastores are configured for master-slave replication. There's no inherent automatic failover but I of course want this. That is, I'd love for access to the master datastore to always just work without having to configure a client library to detect when a master is down and failover to the slave. I've seen Wackamole which is based on the Spread Toolkit. You provide Wackamole with a set of IPs and a bunch of nodes, and regardless of the up/down state of any of the nodes, those IPs will stay available/up. Wackamole detects when a node goes down and ARPs the IP(s) that were up on the now-down node. It's pretty neat actually. So, my thought was to use Wackamole to keep the 2 virtual private IPs available/up. Clients would then just always use the same private IP to access the master datastore and the same but distinct IP for the slave datastore, even if those IPs were hosted on the same node. My datastore servers are accessed over a private network. I am unsure if this messes with Wackamole though. Is this lunacy? How do you generally handle automatic failover of private services like a datastore. FWIW, it shouldn't matter but the datastore is Redis. I don't want to hear "use mySQL" please :) Thanks.

Read the article
distributed, fault-tolerant network block device

- by gucki

I'm looking for a distributed, fault-tolerant network storage system which exposes block devices (not filesystems) on the clients. A client's block device should write simultaneously to several storage nodes A client's block device should not fail as long as not all storage nodes backing it went down The master should automatically redistribute storages' data when a storage node fails or gets added/ removed A single master (which is for metadata only) is fine So ideally the architecture would be very similar to moosefs (http://www.moosefs.org/) but instead of exposing a real filesystem mounted using a fuse client it'd expose block devices on the clients. I know of iscsi and drbd but both don't seem to offer what I'm looking for. Or am I missing something?

Read the article
How do I load balance between two Linux machines?

- by William Hilsum

Inspired by the Stack Overflow network, I am now obsessed with HAProxy and trying to use it myself. At the moment, each HAProxy box has got two network cards (well, two configured, I can have a maximum of 4 and wasn't sure if they needed their own one for management between the boxes). On both machines, the backend one (eth1) is a private IP that goes to a switch connected to the webservers, and the front facing one (eth0) has a public internet IP that is routed straight though. In addition, I have created an additional virtual ip for eth0 called eth0:0 which has got a third public ip address. I just about get how to use it for load balancing between multiple web servers that are behind it, but, I am failing to load balance between the two HAProxy boxes - they appear to fight for the virtual IP, but, this does not appear to be a smart solution. Now, by using the virtual shared IP address, this solution appears to work and does seem to give me maximum uptime, but, is this the correct way to do it, or is there a smarter way? I have been looking at other Linux packages such as keepalived, but, I have only been using Linux (server) for a week now and am at the limits of my understanding. Is there anyone who has done this before and can you advise anything for maximum uptime?

Read the article
SBD killing both cluster nodes when there are even small SAN network problems

- by Wieslaw Herr

I am having problems with stonith SBD in a openais-based cluster. Some background: The active/passive cluster has two nodes, node1 and node2. They are configured to provide an NFS service to users. To avoid problems with split-brain, they are both configured to use SBD. SBD is using two 1MB disks available to the hosts via an multipath fibre-channel network. The problems start if something happens with the SAN network. For example, today one of the brocade switches got rebooted and both nodes lost 2 out of 4 paths to each disks, which resulted in both nodes committing suicide and rebooting. This, of course, was highly undesirable because a) there were paths left b) even if the switch would be out for 10-20 seconds a reboot cycle of both nodes would take 5-10 minutes and all NFS-locks would be lost. I tried increasing the SBD timeout values (to 10sec+ values, dump attached at the end), however a "WARN: Latency: No liveness for 4 s exceeds threshold of 3 s" hints that something isn't working as I would it expect to. Here is what I would like to know: a) Is SBD working as it should killing nodes when 2 paths are available? b) If not, is the multipath.conf file attached correct? The storage controller we use is an IBM SVC (IBM 2145), should there be any specific configuration for it? (as in multipath.conf.defaults) c) How should I go about increasing the timeouts in SBD attachements: Multipath.conf and sbd dump (http://hpaste.org/69537)

Read the article
How to make AD highly available for applications that use it as an LDAP service

- by Beaming Mel-Bin

Our situation We currently have many web applications that use LDAP for authentication. For this, we point the web application to one of our AD domain controllers using the LDAPS port (636). When we have to update the Domain Controller, this has caused us issues because one more web application could depend on any DC. What we want We would like to point our web applications to a cluster "virtual" IP. This cluster will consist of at least two servers (so that each cluster server could be rotated out and updated). The cluster servers would then proxy LDAPS connections to the DCs and be able to figure out which one is available. Questions For anyone that has had experience with this: What software did you use for the cluster? Any caveats? Or perhaps a completely different architecture to accomplish something similar?

Read the article
Corosync :: Restarting some resources after Lan connectivity issue

- by moebius_eye

I am currently looking into corosync to build a two-node cluster. So, I've got it working fine, and it does what I want to do, which is: Lost connectivity between the two nodes gives the first node '10node' both Failover Wan IPs. (aka resources WanCluster100 and WanCluster101 ) '11node' does nothing. He "thinks" he still has his Failover Wan IP. (aka WanCluster101) But it doesn't do this: '11node' should restart the WanCluster101 resource when the connectivity with the other node is back. This is to prevent a condition where node10 simply dies (and thus does not get 11node's Failover Wan IP), resulting in a situation where none of the nodes have 10node's failover IP because 10node is down 11node has "given back" his failover Wan IP. Here's the current configuration I'm working on. node 10sch \ attributes standby="off" node 11sch \ attributes standby="off" primitive LanCluster100 ocf:heartbeat:IPaddr2 \ params ip="172.25.0.100" cidr_netmask="32" nic="eth3" \ op monitor interval="10s" \ meta is-managed="true" target-role="Started" primitive LanCluster101 ocf:heartbeat:IPaddr2 \ params ip="172.25.0.101" cidr_netmask="32" nic="eth3" \ op monitor interval="10s" \ meta is-managed="true" target-role="Started" primitive Ping100 ocf:pacemaker:ping \ params host_list="192.0.2.1" multiplier="500" dampen="15s" \ op monitor interval="5s" \ meta target-role="Started" primitive Ping101 ocf:pacemaker:ping \ params host_list="192.0.2.1" multiplier="500" dampen="15s" \ op monitor interval="5s" \ meta target-role="Started" primitive WanCluster100 ocf:heartbeat:IPaddr2 \ params ip="192.0.2.100" cidr_netmask="32" nic="eth2" \ op monitor interval="10s" \ meta target-role="Started" primitive WanCluster101 ocf:heartbeat:IPaddr2 \ params ip="192.0.2.101" cidr_netmask="32" nic="eth2" \ op monitor interval="10s" \ meta target-role="Started" primitive Website0 ocf:heartbeat:apache \ params configfile="/etc/apache2/apache2.conf" options="-DSSL" \ operations $id="Website-one" \ op start interval="0" timeout="40" \ op stop interval="0" timeout="60" \ op monitor interval="10" timeout="120" start-delay="0" statusurl="http://127.0.0.1/server-status/" \ meta target-role="Started" primitive Website1 ocf:heartbeat:apache \ params configfile="/etc/apache2/apache2.conf.1" options="-DSSL" \ operations $id="Website-two" \ op start interval="0" timeout="40" \ op stop interval="0" timeout="60" \ op monitor interval="10" timeout="120" start-delay="0" statusurl="http://127.0.0.1/server-status/" \ meta target-role="Started" group All100 WanCluster100 LanCluster100 group All101 WanCluster101 LanCluster101 location AlwaysPing100WithNode10 Ping100 \ rule $id="AlWaysPing100WithNode10-rule" inf: #uname eq 10sch location AlwaysPing101WithNode11 Ping101 \ rule $id="AlWaysPing101WithNode11-rule" inf: #uname eq 11sch location NeverLan100WithNode11 LanCluster100 \ rule $id="RAND1083308" -inf: #uname eq 11sch location NeverPing100WithNode11 Ping100 \ rule $id="NeverPing100WithNode11-rule" -inf: #uname eq 11sch location NeverPing101WithNode10 Ping101 \ rule $id="NeverPing101WithNode10-rule" -inf: #uname eq 10sch location Website0NeedsConnectivity Website0 \ rule $id="Website0NeedsConnectivity-rule" -inf: not_defined pingd or pingd lte 0 location Website1NeedsConnectivity Website1 \ rule $id="Website1NeedsConnectivity-rule" -inf: not_defined pingd or pingd lte 0 colocation Never -inf: LanCluster101 LanCluster100 colocation Never2 -inf: WanCluster100 LanCluster101 colocation NeverBothWebsitesTogether -inf: Website0 Website1 property $id="cib-bootstrap-options" \ dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ no-quorum-policy="ignore" \ stonith-enabled="false" \ last-lrm-refresh="1408954702" \ maintenance-mode="false" rsc_defaults $id="rsc-options" \ resource-stickiness="100" \ migration-threshold="3" I also have a less important question concerning this line: colocation NeverBothLans -inf: LanCluster101 LanCluster100 How do I tell it that this collocation only applies to '11node'.

Read the article
DRBD Replication failure

- by user62513

A couple of weeks ago I setup a 2 nodes CRM system with one of the resources managed being MySQL over DRBD. Today for maintenance reasons I restarted both nodes but now they can't connect to each other anymore. DRBD fell out of sync and I followed this guide to get it back connected but it's only able to run successfully on one node. But this strange thing happens: If I crm node standby both nodes and I try: crm node online node0 before crm node online node1, all the CRM resources start successfully but the DRBD partitions are still running in StandAlone state. crm node online node1 beofre crm node online node0, the DRBD resource fails to start, thus causing mysql not to start. If I standby both resources and call crm node online node0 then it times out and prints this error: Running crm node online node0 produces this output after timing out Error setting standby=off (section=nodes, set=<null>): Remote node did not respond Error performing operation: Remote node did not respond Is there anything I'm doing wrong here? An alternative will be just do MySQL replication but I'm not sure how to promote a slave to master when the master database is not available.

Read the article

< Previous Page | 24 25 26 27 28 29 30 31 32 33 34 35 | Next Page >