Search Results

Search found 1781 results on 72 pages for 'cluster'.

Page 36/72 | < Previous Page | 32 33 34 35 36 37 38 39 40 41 42 43 | Next Page >

netlogon errors

- by rorr

I have two instances of mssql 2005 and am using CA XOSoft replication. The master is a failover cluster and the replica is a standalone server. They are all running Server 2003 sp2 x64. Same patch levels on all servers. This setup has worked great for several months until we recently restricted the RPC ports on both nodes of the master(5000 - 6000 using rpccfg.exe). We have to implement egress filtering, thus the limiting of the ports. We began receiving login errors for sql windows authentication and NETLOGON Event ID: 5719: This computer was not able to set up a secure session with a domain controller in domain due to the following: Not enough storage is available to process this command. This may lead to authentication problems. Make sure that this computer is connected to the network. If the problem persists, please contact your domain administrator. We also see group policies failing to update and cluster file shares go offline at the same time. The RPC ports were set back to default when we started seeing these problems and the servers rebooted, but the problems persist. The domain controllers are not showing any errors. Running dcdiag and netdiag shows everything is fine. We have noticed that the XOSoft service ws_rep.exe is using a lot of handles(8 - 9k), about the same number that sqlserver is using. As soon as xosoft replication is stopped the login errors cease and everything functions correctly. I have opened a ticket with CA for XOSoft, but I'm not sure that the problem is actually xosoft, but that it is the one bringing the problem to light. I'm looking for tips on debugging RPC problems. Specifically on limiting the ports and then reverting the changes.

Read the article
Sticky connection and HTTPS support for HAProxy

- by Saif

We have 2 HTTP Load balancer with HAproxy and heartbeat. There are 4 apache nodes in this cluster. It's doing round robin load balancing. The HTTP cluster working fine. We are having problem with our portal because it uses SSO. We need sticky connection support in our HAproxy. Also we need load balancing for HTTPS traffic. Here's our HAproxy conf file. global # to have these messages end up in /var/log/haproxy.log you will # need to: # # 1) configure syslog to accept network log events. This is done # by adding the '-r' option to the SYSLOGD_OPTIONS in # /etc/sysconfig/syslog # # 2) configure local2 events to go to the /var/log/haproxy.log # file. A line like the following can be added to # /etc/sysconfig/syslog # # local2.* /var/log/haproxy.log # log 127.0.0.1 local0 log 127.0.0.1 local1 notice chroot /var/lib/haproxy pidfile /var/run/haproxy.pid maxconn 4000 user haproxy group haproxy daemon # turn on stats unix socket stats socket /var/lib/haproxy/stats #--------------------------------------------------------------------- # common defaults that all the 'listen' and 'backend' sections will # use if not designated in their block #--------------------------------------------------------------------- defaults mode http log global option httplog option dontlognull option http-server-close option forwardfor except 127.0.0.0/8 option redispatch retries 3 timeout http-request 10s timeout queue 1m timeout connect 10s timeout client 1m timeout server 1m timeout http-keep-alive 10s timeout check 10s maxconn 3000 #--------------------------------------------------------------------- # main frontend which proxys to the backends #--------------------------------------------------------------------- frontend main *:5000 acl url_static path_beg -i /static /images /javascript /stylesheets acl url_static path_end -i .jpg .gif .png .css .js use_backend static if url_static default_backend app #--------------------------------------------------------------------- # static backend for serving up images, stylesheets and such #--------------------------------------------------------------------- backend static balance roundrobin server static 127.0.0.1:4331 check #--------------------------------------------------------------------- # round robin balancing between the various backends #--------------------------------------------------------------------- backend app listen ha-http 10.190.1.28:80 mode http stats enable stats auth admin:xxxxxx balance roundrobin cookie JSESSIONID prefix option httpclose option forwardfor option httpchk HEAD /haproxy.txt HTTP/1.0 server apache1 portal-04:80 cookie A check server apache2 im-01:80 cookie B check server apache3 im-02:80 cookie B check server apache4 im-03:80 cookie B check Please advice. Thanks for your help in advance.

Read the article
What is optimal hardware configuration for heavy load LAMP application

- by Piotr K.

I need to run Linux-Apache-PHP-MySQL application (Moodle e-learning platform) for a large number of concurrent users - I am aiming 5000 users. By concurrent I mean that 5000 people should be able to work with the application at the same time. "Work" means not only do database reads but writes as well. The application is not very typical, since it is doing a lot of inserts/updates on the database, so caching techniques are not helping to much. We are using InnoDB storage engine. In addition application is not written with performance in mind. For instance one Apache thread usually occupies about 30-50 MB of RAM. I would be greatful for information what hardware is needed to build scalable configuration that is able to handle this kind of load. We are using right now two HP DLG 380 with two 4 core processors which are able to handle much lower load (typically 300-500 concurrent users). Is it reasonable to invest in this kind of boxes and build cluster using them or is it better to go with some more high-end hardware? I am particularly curious how many and how powerful servers are needed (number of processors/cores, size of RAM) what network equipment should be used (what kind of switches, network cards) any other hardware, like particular disc storage solutions, etc, that are needed Another thing is how to put together everything, that is what is the most optimal architecture. Clustering with MySQL is rather hard (people are complaining about MySQL Cluster, even here on Stackoverflow).

Read the article
HA Proxy won't load balance my web requests. What have I done wrong?

- by Josh Smeaton

I've finally got HA Proxy set up and running in a way I think I want. However, it is not load balancing the web requests it receives. All requests are currently being forwarded to the first server in the cluster. I'm going to paste my configuration below - if anyone can see where I may have gone wrong, I'd appreciate it. This is my first stab at configuring web servers in a *nix environment. First up, I have HA Proxy running on the same host as the first server in the apache cluster. We are moving these servers to virtual later on, and they will have different virtual hosts, but I wanted to get this running now. Both web servers are receiving their health checks, and are reporting back correctly. The haproxy?stats page correctly reports servers that are up and down. I've tested this by altering the name of the file that is checked. I haven't put any load onto these servers yet. I've just opened up the URLs on several tabs (private browsing), and had several co-workers hit the URL too. All of the traffic goes to WEB1. Am I balancing incorrectly? global maxconn 10000 nbproc 8 pidfile /var/run/haproxy.pid log 127.0.0.1 local0 debug daemon defaults log global mode http retries 3 option redispatch maxconn 5000 contimeout 5000 clitimeout 50000 srvtimeout 50000 listen WEBHAEXT :80,:8443 mode http cookie sessionbalance insert indirect nocache balance roundrobin option httpclose option forwardfor except 127.0.0.1 option httpchk HEAD health_check.txt stats enable stats auth rah:rah server WEB1 10.90.2.131:81 cookie WEB_1 check server WEB2 10.90.2.130:80 cookie WEB_2 check

Read the article
(solved) `ssh foo "<command/>"` not loading remote aliases?

- by TomRoche

summary: Why does this fail $ ssh foo 'R --version | head -n 1' bash: R: command not found but this succeeds $ ssh foo 'grep -nHe 'bashrc' ~/.bash_profile' /home/me/.bash_profile:3:# source the users .bashrc if it exists /home/me/.bash_profile:4:if [ -f "${HOME}/.bashrc" ] ; then /home/me/.bash_profile:5: source "${HOME}/.bashrc" $ ssh foo 'grep -nHe "\WR\W" ~/.bashrc' /home/me/.bashrc:118:alias R='/share/linux86_64/bin/R' $ ssh foo '/share/linux86_64/bin/R --version | head -n 1' R version 2.14.1 (2011-12-22) ? details: I am a (rootless) user on 2 clusters. One uses environment modules, so any given server on that cluster can provide (via module add) pretty much the same resources. The other cluster, on which I must also unfortunately work, has servers managed individually, so I get in the habit of doing, e.g., EXEC_NAME='whatever' for S in 'foo' 'bar' 'baz' ; do ssh ${SERVER} "${EXEC_NAME} --version" done This works fine for packages installed normally/consistently, but often (for reasons unknown to me) packages are not: e.g. (compare alias below to alias above), $ ssh bar 'R --version | head -n 1' bash: R: command not found $ ssh bar 'grep -nHe 'bashrc' ~/.bash_profile' /home/me/.bash_profile:3:# source the users .bashrc if it exists /home/me/.bash_profile:4:if [ -f "${HOME}/.bashrc" ] ; then /home/me/.bash_profile:5: source "${HOME}/.bashrc" $ ssh bar 'grep -nHe "\WR\W" ~/.bashrc' /home/me/.bashrc:118:alias R='/share/linux/bin/R' $ ssh bar '/share/linux86_64/bin/R --version | head -n 1' R version 2.14.1 (2011-12-22) Using aliases copes well with these install differences when I interactively shell into the server, but fails when I try to script ssh commands (as above); i.e., # interactively $ ssh foo ... foo> R --version calls my alias for R on remote host=foo, but # scripting $ ssh foo 'R --version' doesn't. What do I need to do to make ssh foo "<command/>" load my aliases on the remote host?

Read the article
0 connected nodes in datastax opscenter

- by gansbrest

Installed opscenterd on the separate node outside of the cluster, but within firewall ( aws security group ). Tested all possible ports between agents and opcenter server. No errors in the log.. 2013-10-30 01:07:23+0000 [FC_Cluster] INFO: Initializing event storage. 2013-10-30 01:07:23+0000 [FC_Cluster] INFO: Attempting to load all persisted alert rules 2013-10-30 01:07:23+0000 [FC_Cluster] INFO: Done loading persisted alert rules 2013-10-30 01:07:23+0000 [FC_Cluster] INFO: Done initializing event storage. 2013-10-30 01:07:23+0000 [FC_Cluster] INFO: Done loading persisted scheduled job descriptions 2013-10-30 01:07:23+0000 [FC_Cluster] INFO: OpsCenter starting up. 2013-10-30 01:07:23+0000 [] INFO: Finished starting new cluster services for FC_Cluster 2013-10-30 01:08:04+0000 [FC_Cluster] INFO: Agent for ip 10.34.10.185 is version u'3.2.2' 2013-10-30 01:08:04+0000 [FC_Cluster] INFO: Agent for ip 10.32.37.251 is version u'3.2.2' 2013-10-30 01:08:04+0000 [FC_Cluster] INFO: Agent for ip 10.82.226.252 is version u'3.2.2' The most interesting part that I can see some data in the opscenter UI, when I stop agents, there is no data displayed, when I start - it show up again, but at the same time it shows 0 connected nodes. Storage capacity is even funnier - 3 of 0 nodes.. Any ideas why that could be happening?

Read the article
Getting started with webserver clustering.

- by Ernie

I work for a small ISP, and we host about 250 domains and all the stuff that goes along with that: DNS, mail, spam filtering, and backups. Currently, we have separate DNS servers (two of them) and mail servers (outgoing mail is actually on the secondary DNS server, but was previously on its own server). In the past, this was done as an insurance measure. The last thing we need is for some doofus (usually yours truly) to hose a server, taking out DNS and mail right along with it, or for spammers to jam our incoming SMTP server, preventing outgoing mail from being sent too. In the past, this was a problem, and our servers were set up the way they are now to combat it. However, clustering solutions like Sun's Cobalt RAQ (in days of olde) and Virtualmin appear to cater to an all-in-one approach, then deal with failures through redundant servers. I have avoided this thus far, but we've been using Virtualmin on our web server for a while now, and I'd like to expand into using it for a high availability cluster. Our networking partner has recently built a datacenter that has eliminated all of our other bugaboos like network, cooling, and power issues, so now the only thing left to go wrong is me hosing a server, which happened earlier this month. One of the bigger reasons we've avoided going this route is because our hardware requirements aren't particularly high. One server easily handles all the sites we host (most of them are flat sites). Also, load-balancing routers tend to be expensive and complicated. All that I'm really expecting to do is building a two-node cluster for redundancy so that when I hose a server (however rare that might be), we're not out for 8-12 hours while I rebuild it. What I need to know is how to get started, and if I'm really in a position to bother with this kind of thing at all.

Read the article
How do you setup FTP with IIS Manager Users in an NLB environment with shared IIS configs?

- by William Jens

I've setup a 2 node NLB cluster and used the following to share IIS configs between them. http://blogs.technet.com/b/meamcs/archive/2012/05/30/configuring-iis-7-5-shared-configuration.aspx The IIS configs and content is located on a network share via a UNC path. This works - updating IIS settings on one node, is visible in another node and my website works on the individual nodes and the cluster as whole. I'm able to setup an FTP site and successfully connect with my Windows login. However, I want to use IIS Manager Authentication as defined in: http://www.iis.net/learn/publish/using-the-ftp-service/configure-ftp-with-iis-manager-authentication-in-iis-7 I've tried using "Network Service" with the FTP COM object as well as a dedicated user account that exists on all three hosts, but every time I try to login with an IIS user I get something like the following: IISWMSVC_AUTHENTICATION_UNABLE_TO_READ_CONFIG An unexpected error occurred while retrieving the authentication information. Exception:System.Runtime.InteropServices.COMException (0x8007052E): Filename: Error: at Microsoft.Web.Administration.Interop.AppHostWritableAdminManager.GetAdminSection(String bstrSectionName, String bstrSectionPath) at Microsoft.Web.Administration.Configuration.GetSectionInternal(ConfigurationSection section, String sectionPath, String locationPath) at Microsoft.Web.Management.Server.ConfigurationAuthenticationProvider.GetSection(ServerManager serverManager) Process:dllhost User=NT AUTHORITY\NETWORK SERVICE Can anyone point me in the right direction here?

Read the article
SQL 2008 - db mail issue

- by Chris

Hello. I have two instances of SQL Server 2008. One was upgraded from SQL Server 2000 and one was a clean, new install. The instances are running on different nodes of the same cluster, although I have tried having them both on the same node with identical results. SQL Mail operates perfectly on both instances. DB Mail operates perfectly on the newly installed instance. On the upgraded instance, DB Mail does not send any mail. Of course, I am not positive that the fact this instance is upgraded has anything to do with the issue, but it might. The configuration of my db mail profile and account looks identical to my functioning instance. In the configuration of the 'alerts' tab in the SQL Agent properties i have tried selecting both DB Mail and SQL Mail to no avail. Both instances use the same SMTP server with the same authentication (domain with db engine account). All messages sent via sp_send_db mail and those sent via the 'test email' option are visible in the sysmail_allitems queue and remain there as 'unsent'. The send_status eventually changes to 'failed'. The only messages in the sysmail_event_log are 'mail queue stopped by login domain\myuser', 'mail queue started by login domain/myuser' and 'activiation successful.'. selecting from the externalmailqueue has the same number of rows as sysmail_allitems. i have tried bouncing the agent, the entire instance and moving the other functioning instance to the other node in the cluster. any thoughts? thx.

Read the article
Appears to be "randomly" switching between the acl matched backend and the default backend

- by Xoor

I have HAProxy acting as a proxy in front of: An NGinx instance An in-house load balancer in front of multiple dynamic services exposed with socket.io (websockets) My problem is that from time to time my connections are proxied correctly to my socket.io cluster, and then randomly it fallsback to routing to NGinx which obviously is annoying and meaningless since NGinx isn't mean't to handle the request. This happens when requesting for URLs of the format : http://mydomain.com/backends/* There's an ACL in the HAProxy config to match the '/backends/*' path. Here's a simplified version of my HAProxy config (removed extra unrelated entries and changed names): global daemon maxconn 4096 user haproxy group haproxy nbproc 4 defaults mode http timeout server 86400000 timeout connect 5000 log global #this frontend interface receives the incoming http requests frontend http-in mode http #process all requests made on port 80 bind *:80 #set a large timeout for websockets timeout client 86400000 # Default Backend default_backend www_backend # Loadfire (socket cluster) acl is_loadfire_backends path_beg /backends use_backend loadfire_backend if is_loadfire_backends # NGinx backend backend www_backend server www_nginx localhost:12346 maxconn 1024 # Loadfire backend backend loadfire_backend option forwardfor # This sets X-Forwarded-For option httpclose server loadfire localhost:7101 maxconn 2048 It's really quite confusing for me why the behaviour appears to be "random", since being hard to reproduce it's hard to debug. I appreciate any insight on this.

Read the article
Distributed File Systems.

- by GruffTech

So, I've been reading several articles around ServerFault as well as google. (For Example, this link) My Requirements are very similar to the link above, however i'd like to also have dynamic or at least resizeable file volumes, so if necessary i can add 4-5 servers to the pool, and then expand the volume. Any Distributed File systems that support that, to save me some time? Thanks! LustreFS will be my next test cluster to build. GlusterFS I've build a 3-machine test GlusterFS cluster, However i quickly became aware of several of its limitations that it doesn't seem to make clearly public. One, i can't seem to resize a volume. Once a volume is created, its done. Which seems retarded, why have a fully scalable file system if i can't scale a volume? So maybe i'm doing something wrong. I'm not sure. AmazonS3 while gives the cheapest startup adds too much cost when broken down to per client per month, so its out. Building my own system when prorated over several years with no bandwidth costs makes it significantly cheaper. MogileFS isn't an option as we'd like this server to be a SAN-Replacement, for storing tons of media from a multitude of systems, which for us means it needs to be POSIX compliant so it can be remotely mounted via NFS or CIFS.

Read the article
Replicated MongoDB server slower than simple shards

- by displayName

I tried to compare the performance of a sharded configuration against a sharded and replicated configuration. The sharded configuration consists of 8 shards each running on three different machines thereby constituting a total of 24 shards. All 8 of these shards run in the same partition on each machine. The sharded and replicated version is 8 shards again just like plain sharding, and all 8 mongods run on the same partition in each machine. But apart from this, each of these three machine now run additional 16 threads on another partition which serve as the secondary for the 8 mongods running on other machines. This is the way I prepared a sharded and replicated configuration with data chunks having replication factor of 3. Important point to note is that once the data has been loaded, it is not modified. So after primary and secondaries have synchronized then it doesn't matter which one i read from. To run the queries, I use an entirely different machine (let's call it config) which runs mongos and this machine's only purpose is to receive queries and run them on the cluster. Contrary to my expectations, plain sharding of 8 threads on each machine (total = 3 * 8 = 24) is performing better for queries than the sharded + replicated configuration. I have a script written to perform the query. So in order to time the scripts, I use time ./testScript and see the result. I tried changing the reading preference for replicated cluster by logging to mongo of config and run db.getMongo().setReadPref('secondary') and then exit the shell and run the queries like time ./testScript. The questions are: Where am i going wrong in the replication? Why is it slower than its plain sharding version? Does the db.getMongo().ReadPref('secondary') persist when i leave the shell and try to perform the query? All the four machines are running Linux and i have already increased the ulimit -n to 2048 from initial value of 1024 to allow more connections. The collections are properly distributed and all the mongods have equal number of chunks. Goes without saying that indices in both configurations are the same.

Read the article
Experiences in Upgrading from Exchange 2003 to Exchange 2010

- by gWaldo

I'm currently running Exchange 2003 SP2 Cluster on a Server 2003 AD Forest (in native 2003 mode), and we beginning to plan the upgrade to Server 2008 AD and Exchange 2010. We have two main sites, one middle-sized office, and a couple of smaller sites which have DCs (which may be RODCs after the upgrade). Currently all of our Exchange cluster is in my main site, but we are considering using the new datastore paradigm for load-balance/failover at the other large site, but this is not set in stone. Right now we are in the information-gathering and planning phases. I am looking for input of any gotchas experienced while performing either upgrade, but especially the Exchange upgrade. Gotchas? What surprised you? What wasn't documented? What said one thing but was misleading? (Confusing either in content or severity.) What is great or horrible about the new system? What worked well? What worked poorly? If you were to do it over again...? (I know that this isn't so much a question that can be definitively answered, but I'm happy to reward insight and useful resources (not the Microsoft documentation, but Blogposts are welcome) with upvotes.) UPDATE A couple items of note: -We are not currently using OWA (currently only the admins), but it may become more of a consideration with iOS devices. -We do have a small number of Blackberries in the environment (< 10%). -In addition to the standard Exchange connectors, we have a third-party connector for Captaris RightFax integration.

Read the article
How many reverse proxies (nginx, haproxy) is too many?

- by Alysum

I'm setting up a HA (high availability) cluster using nginx, haproxy & apache. I've been reading great things about nginx and haproxy. People tend to choose one or the other but I like both. Haproxy is more flexible for load balancing than nginx's simple round robin (even with the upstream-fair patch). But I'd like to keep nginx for redirecting non-https to https among other things right at the point of entry to the cluster. On the other hand, nginx is a lot faster for serving static contents and would reduce the load on the powerful apache which loves to eat a lot of RAM! Here is my planned setup: Load balancer: nginx listens on port 80/443 and proxy_forwards to haproxy on 8080 on the same server to load balance between the multiple nodes. Nodes: nginx on the node listens to requests coming from haproxy on 8080, if the content is static, serve it. But if it's a backend script (in my case PHP), proxy forward to apache2 on the same node server listenning on a different port number. Technically this setup works but my concerns are whether having the requests going through several proxies is going to slow down requests? Most of the requests will be PHP requests as the backends are services (which means groing from nginx - haproxy - nginx - apache). Thoughts? Cheers

Read the article
Tool to Save a Range of Disk Clusters to a File

- by Synetech inc.

Hi, Yesterday I deleted a (fragmented) archive file only to find that it did not extract correctly, so I was left stranded. Fortunately there was not much space free on the drive, so most of the space marked as free was from the now-deleted archive. I pulled up a disk editor and—painfully—managed to get a list of cluster ranges from the FAT that were marked as unused. My task then was to save these ranges of clusters to files so that I could examine them to try to determine which parts were from the archive and recombine them to attempt to restore the deleted file. This turned out to be a huge pain in the butt because the disk editor did not have the ability to select a range of clusters, so I had to navigate to the start of each cluster and hold down Ctrl+Shift+PgDn until I reached the end of the range (which usually took forever!) I did a quick Google search to see if I could find a command-line tool (preferably with Windows and DOS versions) that would allow me to issue a commands such as: SAVESECT -c 0xBEEF 0xCAFE FOO.BAR ::save clusters 0xBEEF-0xCAFE to FOO.BAR SAVESECT -s 1111 9876 BAZ.BIN ::save sectors 1111-9876 to BAZ.BIN Sadly my search came up empty. Any ideas? Thanks!

Read the article
10GE network: Is it still deadly expensive? Any options?

- by BarsMonster

Hi! I am building home cluster where I going to have about 16 nodes which can live with 1G ports, but I really want to have 10GE on file server & central node. It's all local, so no need for cabels longer than 3-5m. And ofcourse I want to spend as little money as possible (not going to spend more than whole cluster costs) :-) What are my options? 1) Legacy solution is to take some 24-48 port 1GE switch, and connect to file/central nodes via 4-8 aggregated links. This will work I guess, cost is very acceptable, but I am not sure if it's ok to use that much aggregated links. And ofcourse it would be hard to double bandwidth when needed... :-D 2) Switch with several 10GE uplink 'ports'. As far as I see, they all require modules which costs about 1000$, so I will need 4 10G modules, and 2 10GE cards... Smells like way more than 5000$+... 3) Connect file & central node via 2 10G cards directly, and put 4 quadport 1GE NICs on fileserver. I am saving on 2 10G modules and a switch, fileserver will have to do packet routing, but it's still gonna have alot of CPU's left :-) 4) Any other options? Infiniband? 5) Are MyriNet adaptors works fine? I guess there are no cheaper options? 6) Hmm... Scrap fileserver, put it all on central node and provide dedicated 1GE port for each of the nodes... This is sad...

Read the article
Installing and running two postgresql versions on different ports (or two instances of same server)

- by Andrius

I have postgresql 9.1 installed on my machine (Ubuntu). I need another postgresql server that would run next to the old one. Exact version does not matter, but I'm thinking of using 9.2 version. How could I properly install and run another postgresql version without screwing old one (like upgrading). So those versions would run independently on different ports. Old one on 5432 and new one on 5433 for example. The reason I need this is for two OpenERP versions databases. If I run two OpenERP servers (with different versions) on single postgresql port, it crashes because new OpenERP version detects old versions database and tries to run it, but it crashes because it uses another schemes. P.S or maybe I could just run same postgresql server on two ports? Update So far I tried this: /usr/lib/postgresql/9.1/bin/pg_ctl initdb -D main2 It created new cluster. I changed port to 5433 in new clusters directory postgresql.conf file. Then ran this: /usr/lib/postgresql/9.1/bin/pg_ctl -D main2 -l logfile start I got response server starting. But when I tried to enter new cluster's template database with: psql template1 -p 5433 I got this error: psql: could not connect to server: No such file or directory Is the server running locally and accepting connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5433"? Also now when I try to stop server with: /usr/lib/postgresql/9.1/bin/pg_ctl -D main2 -l logfile start I get this error: pg_ctl: PID file "main2/postmaster.pid" does not exist Is server running? So I don't understand if server is running and what I'm missing here? Update Found what was wrong. Stupid me. I didn't notice that when I changed port in .conf file, that line was commented already. So actually I didn't change anything first time, but thought I did and it used default 5432 port.

Read the article
Heartbeat/DRBD failover didn't work as expected. How do I make the failover more robust?

- by Quinn Murphy

I had a scenario where a DRBD-heartbeat set up had a failed node but did not failover. What happened was the primary node had locked up, but didn't go down directly (it was inaccessible via ssh or with the nfs mount, but it could be pinged). The desired behavior would have been to detect this and failover to the secondary node, but it appears that since the primary didn't go full down (there is a dedicated network connection from server to server), heartbeat's detection mechanism didn't pick up on that and therefore didn't failover. Has anyone seen this? Is there something that I need to configure to have more robust cluster failover? DRBD seems to otherwise work fine (had to resync when I rebooted the old primary), but without good failover, it's use is limited. heartbeat 3.0.4 drbd84 RHEL 6.1 We are not using Pacemaker nfs03 is the primary server in this setup, and nfs01 is the secondary. ha.cf # Hearbeat Logging logfacility daemon udpport 694 ucast eth0 192.168.10.47 ucast eth0 192.168.10.42 # Cluster members node nfs01.openair.com node nfs03.openair.com # Hearbeat communication timing. # Sets the triggers and pulse time for swapping over. keepalive 1 warntime 10 deadtime 30 initdead 120 #fail back automatically auto_failback on and here is the haresources file: nfs03.openair.com IPaddr::192.168.10.50/255.255.255.0/eth0 drbddisk::data Filesystem::/dev/drbd0::/data::ext4 nfs nfslock

Read the article
Should an HA failover occur in this scenario?

- by joeqwerty

I'm running vSphere 5 in an HA cluster across two hosts (vsphereA and vsphereB). I have the HA cluster configured for host monitoring and datastore heartbeat monitoring with admission control disabled (hopefully I rightfully understand that datastore heartbeat monitoring prevents inadvertent and unwanted HA failovers due to management network isolation). Each host has a single connection to a dedicated iSCSI network and iSCSI target (no MPIO). All vmdk's for all VM's exist on the iSCSI datastore. As a test of HA I disconnected the iSCSI connection on vsphereB and was surprised to see that the running VM's on vsphereB continued to run on vsphereB. The powered off VM's were showing as inaccessible (which I expected due to the fact that they weren't running and the connection from vsphereB to the iSCSI target was severed) but the running VM's continued to run and continued to be "owned" by vsphereB. I expected to see an HA failover occur for those VM's and expected to see them "owned" by vsphereA after the HA failover (which didn't occur). I'm at a loss to understand why an HA failover didn't occur for those VM's. Am I misunderstanding in which cases an HA failover should occur?

Read the article
netlogon errors

- by rorr

I have two instances of mssql 2005 and am using CA XOSoft replication. The master is a failover cluster and the replica is a standalone server. They are all running Server 2003 sp2 x64. Same patch levels on all servers. This setup has worked great for several months until we recently restricted the RPC ports on both nodes of the master(5000 - 6000 using rpccfg.exe). We have to implement egress filtering, thus the limiting of the ports. We began receiving login errors for sql windows authentication and NETLOGON Event ID: 5719: This computer was not able to set up a secure session with a domain controller in domain due to the following: Not enough storage is available to process this command. This may lead to authentication problems. Make sure that this computer is connected to the network. If the problem persists, please contact your domain administrator. We also see group policies failing to update and cluster file shares go offline at the same time. The RPC ports were set back to default when we started seeing these problems and the servers rebooted, but the problems persist. The domain controllers are not showing any errors. Running dcdiag and netdiag shows everything is fine. We have noticed that the XOSoft service ws_rep.exe is using a lot of handles(8 - 9k), about the same number that sqlserver is using. As soon as xosoft replication is stopped the login errors cease and everything functions correctly. I have opened a ticket with CA for XOSoft, but I'm not sure that the problem is actually xosoft, but that it is the one bringing the problem to light. I'm looking for tips on debugging RPC problems. Specifically on limiting the ports and then reverting the changes.

Read the article
CloudFront with Custom Origin and ELB

- by kmfk

We are using CloudFront for our static assets but also wanted to allow for Gzip. We set up a new distribution with a custom origin pointing back to our application servers which are behind a elastic load balancer. We manually keep the files in sync across the cluster and update them when we publish. However, with this set up, we get nothing but Miss and RefreshHits from CloudFront, which so far has defeated the purpose. Is there any additional settings in order to use an ELB as your custom origin? In the docs, it references this as a viable solution. It appears when we point the distribution to a single server in our production cluster, cloudfront properly caches our assets. Is it possible that the sticky sessions cookie and the subsequent header that gets added by it could be an issue? Cache-Control: no-cache="set-cookie" //Added by load balancer Any ideas? FYI - currently, we have our custom origin pointing to a single EC2 instance, so caching is working correctly - in case you try to curl the file below. Example headers: curl -I http://static.quick-cdn.com/css/9850999.css HTTP/1.0 200 OK Accept-Ranges: bytes Cache-Control: max-age=3700 Cache-Control: no-cache="set-cookie" Content-Length: 23038 Content-Type: text/css Date: Thu, 12 Apr 2012 23:03:52 GMT Last-Modified: Thu, 12 Apr 2012 23:00:14 GMT Server: Apache/2.2.17 (Ubuntu) Vary: Accept-Encoding X-Cache: RefreshHit from cloudfront X-Amz-Cf-Id: K_q7Zy3_jdzlEJ85ukELVtdx1GmuXqApAbZZ7G0fPt0mxRMqPKX5pQ==,RzJmPku-rEIO9WlvuSoKa8hiAaR3dLk5KC4cQMWWrf_MDhmjWe8n6A== Via: 1.0 28c34f9fbf559a21ee16594849e4fc9c.cloudfront.net (CloudFront) Connection: close

Read the article
Creating a Jenkins build farm in a hands-off manner?

- by user183394

My colleague and I have set up and run Jenkins on a KVM guest running Ubuntu 12.04 with good results for a while now. We are thinking about deploying a cluster of Jenkins CI hosts in master/slave configuration, with the libvirt slave plugin to keep our hardware count low. Our environment is strictly Linux (CentOS, Scientific Linux, Fedora, and Ubuntu). Both of us are competent in setting up large clusters. We typically use tools like cobbler + a configuration management tool (Puppet, Chef, and alike) to set up a large number of machines (physical and/or virtual) hands off (hundreds of nodes in less than an hour typical). We would like to do the same for nodes running Jenkins. But the step by step guide doesn't give us any clues in this regard. I did see a Multi-slave config plugin. But, being used to dealing with hundreds or more machines completely hands-off, clicking the UI for many machines just doesn't feel right. Can someone point to us a reference that talks about how to set up large cluster of Jenkins CI hosts more in the hands-off way?

Read the article
which is best smart automatic file replication solution for cloud storage based systems.

- by TORr0t

I am looking for a solution for a project i am working on. We are developing a websystem where people can upload their files and other people can download it. (similar to rapidshare.com model) Problem is, some files can be demanded much more than other files. The scenerio is like: I have uploaded my birthday video and shared it with all of my friend, I have uploaded it to myproject.com and it was stored in one of the cluster which has 100mbit connection. Problem is, once all of my friends want to download the file, they cant download it since the bottleneck here is 100mbit which is 15MB per second, but i got 1000 friends and they can only download 15KB per second. I am not taking into account that the hdd is serving same files. My network infrastrucre is as follows: 1 gbit server(client) and connected to 4 Nodes of storage servers that have 100mbit connection. 1gbit server can handle the 1000 users traffic if one of storage node can stream more than 15MB per second to my 1gbit (client) server and visitor will stream directly from client server instead of storage nodes. I can do it by replicating the file into 2 nodes. But i dont want to replicate all files uploadded to my network since it is costing much more. So i need a cloud based system, which will push the files into replicated nodes automatically when demanded to those files are high, and when the demand is low, they will delete from other nodes and it will stay in only 1 node. I have looked to gluster and asked in their irc channel that, gluster cant do such a thing. It is only able to replicate all the files or none of the files. But i need it the cluster software to do it automatically. Any solutions ? (instead of recommending me amazon s3) S

Read the article
glusterfs mounts get unmounted when 1 of the 2 bricks goes offline

- by Shiquemano

I have an odd case where 1 of the 2 replicated glusterfs bricks will go offline and take all of the client mounts down with it. As I understand it, this should not be happening. It should fail over to the brick that is still online, but this hasn't been the case. I suspect that this is due to configuration issue. Here is a description of the system: 2 gluster servers on dedicated hardware (gfs0, gfs1) 8 client servers on vms (client1, client2, client3, ... , client8) Half of the client servers are mounted with gfs0 as the primary, and the other half are pointed at gfs1. Each of the clients are mounted with the following entry in /etc/fstab: /etc/glusterfs/datavol.vol /data glusterfs defaults 0 0 Here is the content of /etc/glusterfs/datavol.vol: volume datavol-client-0 type protocol/client option transport-type tcp option remote-subvolume /data/datavol option remote-host gfs0 end-volume volume datavol-client-1 type protocol/client option transport-type tcp option remote-subvolume /data/datavol option remote-host gfs1 end-volume volume datavol-replicate-0 type cluster/replicate subvolumes datavol-client-0 datavol-client-1 end-volume volume datavol-dht type cluster/distribute subvolumes datavol-replicate-0 end-volume volume datavol-write-behind type performance/write-behind subvolumes datavol-dht end-volume volume datavol-read-ahead type performance/read-ahead subvolumes datavol-write-behind end-volume volume datavol-io-cache type performance/io-cache subvolumes datavol-read-ahead end-volume volume datavol-quick-read type performance/quick-read subvolumes datavol-io-cache end-volume volume datavol-md-cache type performance/md-cache subvolumes datavol-quick-read end-volume volume datavol type debug/io-stats option count-fop-hits on option latency-measurement on subvolumes datavol-md-cache end-volume The config above is the latest attempt at making this behave properly. I have also tried the following entry in /etc/fstab: gfs0:/datavol /data glusterfs defaults,backupvolfile-server=gfs1 0 0 This was the entry for half of the clients, while the other half had: gfs1:/datavol /data glusterfs defaults,backupvolfile-server=gfs0 0 0 The results were exactly the same as the above configuration. Both configs connect everything just fine, they just don't fail over. Any help would be appreciated.

Read the article
Diagnosing Random Network Lag

- by uesp

I'm having trouble diagnosing some random lag on a 6 server LAMP cluster serving a MediaWiki site. While we're serving some 100 pages/sec the servers themselves are running fine with less than 0.5 load, no locked processes, no paging, no errors being logged, etc.... Lag is present on all servers and is random: one minute its fine the next it's there. DNS lookups on the servers are randomly slow. For example time nslookup google.com varies randomly from a few milliseconds to several seconds and sometimes times out entirely. While we use IP addresses internally on the cluster this may be a symptom of the root issue. We are not running our own DNS server. The Apache server-status pages randomly lag or time out. Benchmarking using ab between servers shows a few loads sometimes take 3000 ms (almost exactly). Benchmarking server-status on the local server itself usually shows no issue (it showed a lag only once among a few hundred tests). The servers are sitting behind a switch and a firewall which I don't have any access to so I don't know their setup or status. While we are under heavier than normal load a 2 Mbps incoming and 20 Mbps outgoing traffic shouldn't be stressing the switch or firewall should it? My feeling is that it is the switch/firewall or something above them in the ISP like their DNS but can't confirm it. I need some other tests or methods of diagnosing this lag to try and narrow down the ultimate cause.

Read the article

Search Results

Search found 1781 results on 72 pages for 'cluster'.

Page 36/72 | < Previous Page | 32 33 34 35 36 37 38 39 40 41 42 43 | Next Page >

- by rorr

- by Saif

- by Piotr K.

- by Josh Smeaton

- by TomRoche

- by gansbrest

- by Ernie

- by William Jens

- by Chris

- by Xoor

- by GruffTech

- by displayName

- by gWaldo

- by Alysum

- by Synetech inc.

- by BarsMonster

- by Andrius

- by Quinn Murphy

- by joeqwerty

- by rorr

- by kmfk

- by user183394

- by TORr0t

- by Shiquemano

- by uesp

< Previous Page | 32 33 34 35 36 37 38 39 40 41 42 43 | Next Page >