Search Results

Search found 4291 results on 172 pages for 'cluster analysis'.

Page 34/172 | < Previous Page | 30 31 32 33 34 35 36 37 38 39 40 41  | Next Page >

  • Using WKA in Large Coherence Clusters (Disabling Multicast)

    - by jpurdy
    Disabling hardware multicast (by configuring well-known addresses aka WKA) will place significant stress on the network. For messages that must be sent to multiple servers, rather than having a server send a single packet to the switch and having the switch broadcast that packet to the rest of the cluster, the server must send a packet to each of the other servers. While hardware varies significantly, consider that a server with a single gigabit connection can send at most ~70,000 packets per second. To continue with some concrete numbers, in a cluster with 500 members, that means that each server can send at most 140 cluster-wide messages per second. And if there are 10 cluster members on each physical machine, that number shrinks to 14 cluster-wide messages per second (or with only mild hyperbole, roughly zero). It is also important to keep in mind that network I/O is not only expensive in terms of the network itself, but also the consumption of CPU required to send (or receive) a message (due to things like copying the packet bytes, processing a interrupt, etc). Fortunately, Coherence is designed to rely primarily on point-to-point messages, but there are some features that are inherently one-to-many: Announcing the arrival or departure of a member Updating partition assignment maps across the cluster Creating or destroying a NamedCache Invalidating a cache entry from a large number of client-side near caches Distributing a filter-based request across the full set of cache servers (e.g. queries, aggregators and entry processors) Invoking clear() on a NamedCache The first few of these are operations that are primarily routed through a single senior member, and also occur infrequently, so they usually are not a primary consideration. There are cases, however, where the load from introducing new members can be substantial (to the point of destabilizing the cluster). Consider the case where cluster in the first paragraph grows from 500 members to 1000 members (holding the number of physical machines constant). During this period, there will be 500 new member introductions, each of which may consist of several cluster-wide operations (for the cluster membership itself as well as the partitioned cache services, replicated cache services, invocation services, management services, etc). Note that all of these introductions will route through that one senior member, which is sharing its network bandwidth with several other members (which will be communicating to a lesser degree with other members throughout this process). While each service may have a distinct senior member, there's a good chance during initial startup that a single member will be the senior for all services (if those services start on the senior before the second member joins the cluster). It's obvious that this could cause CPU and/or network starvation. In the current release of Coherence (3.7.1.3 as of this writing), the pure unicast code path also has less sophisticated flow-control for cluster-wide messages (compared to the multicast-enabled code path), which may also result in significant heap consumption on the senior member's JVM (from the message backlog). This is almost never a problem in practice, but with sufficient CPU or network starvation, it could become critical. For the non-operational concerns (near caches, queries, etc), the application itself will determine how much load is placed on the cluster. Applications intended for deployment in a pure unicast environment should be careful to avoid excessive dependence on these features. Even in an environment with multicast support, these operations may scale poorly since even with a constant request rate, the underlying workload will increase at roughly the same rate as the underlying resources are added. Unless there is an infrastructural requirement to the contrary, multicast should be enabled. If it can't be enabled, care should be taken to ensure the added overhead doesn't lead to performance or stability issues. This is particularly crucial in large clusters.

    Read the article

  • Which process is using my NAS?

    - by sethu
    I have a nas connected to my cluster. The NAS holds all our home directories. When I did a set of experiments last week, saving a 1 GB file to the nas took around 30 seconds. If i do the same to a local disk it takes 18 seconds. But when I tried doing the same process today, it takes 150 seconds. I am unsure what is the problem . Can someone help me pointout the issue? Is it possible to find out which process is accessing the NAS or how much NAS bandwidth is getting used ? Thanks for your help. -Sethu

    Read the article

  • coordinating a script to run on only one of identical load-balanced servers

    - by Amos Shapira
    I have two identically configured CentOS 5 servers (possibly more in the future). I need to run a cron job on any one of them and that it'll run only on one of them. I know about RedHat Cluster Suite (we use it on other servers), but it's a too big a gun to use for this task, plus it doesn't really behave well for less than three nodes. Is there anything light-weight I can use for that? The servers can communicate with each other directly. I suppose I can develope something over ssh or nrpe (two server which are already installed on these servers), but I was wondering whether there is something already available.

    Read the article

  • diskpart on RDM's ...

    - by karnash
    HI, We have ESXi cluster which is attached to clariion CX4 We have windows 2008 R2 as the guest OS. Attahed to this vm is 2 x 1.95T RDM's I select disk 1 create partition primary size=1 (1MB) then list partition Partition ### Type Size Offset * Partition 1 Primary 1024 KB 1024 KB Then I do the same for the other disk and offset is 1024KB I need to present 4T disk to this vm so I right click on disk 1 convert to simple volume then extend it by adding the second disk now when I do list partition, I see the off set is set to 31k. Can anyone please guide me. Thanks

    Read the article

  • Linux HA - Best Heartbeat hardware solution

    - by Martino Dino
    Hi all I would ask anyone what is the best layer 2 medium for heartbeat in Linux and how it's best configured. More precisely I've been thinking about a dedicated NIC for that purpose but then i thought that if a switch breaks then i would loose the heartbeat connection for most of the cluster and STONITH 'BUM'!!! Will probably loose my job after :) Distributing the heartbeat onto the main NICs of every node trough a vif sounds reasonable but im not sure if this is the best option (at least the switches are redundant to some extent). Is it possible to use heartbeat over a bonded interface and that sounds reasonable? Do you have any other tip/solution for that issue?

    Read the article

  • Upgrading drives on a MD3000

    - by Anonymouse
    Hello, Our MD3000 array is getting full as our databases are growing and we need more spaces. Currently, we use a MD3000 with a two-servers Windows 2003 cluster and 15x 73GB SAS drives. Disk groups are configured in RAID1 of two drives. The approach we are currently investigating is simply swapping the existing SAS drives with bigger ones (300GB instead of 73GB), one at a time, and let each RAID1 array rebuilt. Is it a good approach? Will we be able to resize the array afterwards? Will we be able to resize the partitions afterwards? Can the Dell M3000 Management software do it or will we have to bring the server offline and use some partition software to do it? Thanks in advance.

    Read the article

  • How to make a DHCP server on virtual machine serves other virtual machines(on different physical machines)?

    - by Tony
    I'm building a virtual cluster with VirtualBox and Opensuse. I have 10 physical machines and need several vms on each. The virtual machines are supposed to be in a "private" network, but still have internet access. I was asked to set up a virtual head node working as DHCP server. I installed DHCP server on the virtual head node and it seems works. On VirtualBox I set 2 network adapters to the head node, one bridged adapter and one internal network. One vm on the same physical machine has been set nic as internal network adapter. The vm can get IP address (so DHCP works) but can't access internet. What should I do? Specifically, what network adapter should I choose for head-node and work-nodes in VirtualBox? What in the virtual machines should I do?

    Read the article

  • Decent 1gb switch (16-24 port) for rack...

    - by TomTom
    Hallo, for a rack containing a smaller nubmer of servers (5 at the moment, going to stay in this area), I look to replace the currently aging 100mbit switch with a 1gb switch. This is for the backend between the servers. I expect some ISCIS traffic there ,so a 10gbit option would be nice (preferably for two ports, as extension modules). I dont need management, this is a pure backend of an internal cluster. I do VLAN, but there is no sensible management the switch can do there. I wuold like: * 1he only, obviously * preferable limited moving parts. * Low price ;) * Enough power to run at least half the ports in full speed at the same time. Anyone any recommendations?

    Read the article

  • How to broadcast a command on Windows

    - by Xiao Jia
    I am going to frequently deploy different versions of a program on a cluster of Windows machines (mostly Windows XP), so I am willing to use a command-line broadcasting tool (either built-in or 3rd-party) to (1) download a file from some URL, and (2) execute the same command, on all the machines. I googled for a very long time but got nothing related to my goal. (Only pages about broadcasting a message, broadcasting ping, or programmatically broadcast via TCP/IP, etc.) Are there any tool for this purpose? Or is it possible to do it pragmatically (without installing extra client programs on those machines)?

    Read the article

  • Configuring MPI on 2 nodes

    - by Wysek
    I'm trying to create really simple "cluster" from 2 multicore computers using openmpi. My problem is that I can't find any tutorials on that matter. I don't want to use torque because it's not necessary in my case nevertheless all tutorials give configuration details either about torque or mpd (which doesn't exist in openmpi implementation). Could you give me some tips or links to appropriate manuals? Steps I've already completed: - openmpi installation - network configuration (computers see each other) - ssh password-less login to second computer I tried using machinefiles without further configuration and with just 2 IPs in it. But jobs don't seem to start at all after initialization part. (MPI seems to work because I'm able to scatter jobs on multiple cores of both computers without communication between them).

    Read the article

  • HDFS some datanodes of cluster are suddenly disconnected while reducers are running

    - by user1429825
    I have 8 slave computers and 1 master computer for running Hadoop (ver 0.21) some datanodes of cluster are suddenly disconnected while I was running MapReduce code on 10GB data After all mappers finished and around 80% of reducers was processed, randomly one or more datanode disconned from network. and then the other datanodes start to disappear from network even if I killed the MapReduce job when I found some datanode was disconnected. I've tried to change dfs.datanode.max.xcievers to 4096, turned off fire-walls of all computing node, disabled selinux and increased the number of file open limit to 20000 but they didn't work at all... anyone have a idea to solve this problem? followings are error log from mapreduce 12/06/01 12:31:29 INFO mapreduce.Job: Task Id : attempt_201206011227_0001_r_000006_0, Status : FAILED java.io.IOException: Bad connect ack with firstBadLink as ***.***.***.148:20010 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:889) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:820) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:427) and followings are logs from datanode 2012-06-01 13:01:01,118 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-5549263231281364844_3453 src: /*.*.*.147:56205 dest: /*.*.*.142:20010 2012-06-01 13:01:01,136 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(*.*.*.142:20010, storageID=DS-1534489105-*.*.*.142-20010-1337757934836, infoPort=20075, ipcPort=20020) Starting thread to transfer block blk_-3849519151985279385_5906 to *.*.*.147:20010 2012-06-01 13:01:19,135 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(*.*.*.142:20010, storageID=DS-1534489105-*.*.*.142-20010-1337757934836, infoPort=20075, ipcPort=20020):Failed to transfer blk_-5797481564121417802_3453 to *.*.*.146:20010 got java.net.ConnectException: > Connection timed out at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:373) at org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:1257) at java.lang.Thread.run(Thread.java:722) 2012-06-01 13:06:20,342 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_6674438989226364081_3453 2012-06-01 13:09:01,781 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(*.*.*.142:20010, storageID=DS-1534489105-*.*.*.142-20010-1337757934836, infoPort=20075, ipcPort=20020):Failed to transfer blk_-3849519151985279385_5906 to *.*.*.147:20010 got java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/*.*.*.142:60057 remote=/*.*.*.147:20010] at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:164) at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:203) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:388) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:476) at org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:1284) at java.lang.Thread.run(Thread.java:722) hdfs-site.xml <configuration> <property> <name>dfs.name.dir</name> <value>/home/hadoop/data/name</value> </property> <property> <name>dfs.data.dir</name> <value>/home/hadoop/data/hdfs1,/home/hadoop/data/hdfs2,/home/hadoop/data/hdfs3,/home/hadoop/data/hdfs4,/home/hadoop/data/hdfs5</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.datanode.max.xcievers</name> <value>4096</value> </property> <property> <name>dfs.http.address</name> <value>0.0.0.0:20070</value> <description>50070 The address and the base port where the dfs namenode web ui will listen on. If the port is 0 then the server will start on a free port. </description> </property> <property> <name>dfs.datanode.http.address</name> <value>0.0.0.0:20075</value> <description>50075 The datanode http server address and port. If the port is 0 then the server will start on a free port. </description> </property> <property> <name>dfs.secondary.http.address</name> <value>0.0.0.0:20090</value> <description>50090 The secondary namenode http server address and port. If the port is 0 then the server will start on a free port. </description> </property> <property> <name>dfs.datanode.address</name> <value>0.0.0.0:20010</value> <description>50010 The address where the datanode server will listen to. If the port is 0 then the server will start on a free port. </description> <property> <name>dfs.datanode.ipc.address</name> <value>0.0.0.0:20020</value> <description>50020 The datanode ipc server address and port. If the port is 0 then the server will start on a free port. </description> </property> <property> <name>dfs.datanode.https.address</name> <value>0.0.0.0:20475</value> </property> <property> <name>dfs.https.address</name> <value>0.0.0.0:20470</value> </property> </configuration> mapred-site.xml <configuration> <property> <name>mapred.job.tracker</name> <value>masternode:29001</value> </property> <property> <name>mapred.system.dir</name> <value>/home/hadoop/data/mapreduce/system</value> </property> <property> <name>mapred.local.dir</name> <value>/home/hadoop/data/mapreduce/local</value> </property> <property> <name>mapred.map.tasks</name> <value>32</value> <description> default number of map tasks per job.</description> </property> <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>4</value> </property> <property> <name>mapred.reduce.tasks</name> <value>8</value> <description> default number of reduce tasks per job.</description> </property> <property> <name>mapred.map.child.java.opts</name> <value>-Xmx2048M</value> </property> <property> <name>io.sort.mb</name> <value>500</value> </property> <property> <name>mapred.task.timeout</name> <value>1800000</value> <!-- 30 minutes --> </property> <property> <name>mapred.job.tracker.http.address</name> <value>0.0.0.0:20030</value> <description> 50030 The job tracker http server address and port the server will listen on. If the port is 0 then the server will start on a free port. </description> </property> <property> <name>mapred.task.tracker.http.address</name> <value>0.0.0.0:20060</value> <description> 50060 </property> </configuration>

    Read the article

  • Sending USR2 to mongrel_rails sometimes results in an “Address already in use” on the restart

    - by Ben
    We have a rolling-restart mode for our mongrel cluster that sends a USR2 signal to each running process. This works great, most of the time. But very occasionally the mongrel process will shutdown, and then fail to restart, with the following error: /usr/local/lib/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/tcphack.rb:12:in `initialize_without_backlog': Address already in use - bind(2) (Errno::EADDRINUSE) from /usr/local/lib/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/tcphack.rb:12:in `initialize' from /usr/local/lib/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:93:in `new' from /usr/local/lib/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:93:in `initialize' from /usr/local/lib/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/configurator.rb:139:in `new' from /usr/local/lib/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/configurator.rb:139:in `listener' Looking though the mongrel source, the USR2 handler calls a synchronous stop on the running server, so it ought to block until the socket has been released. Has anyone seen this error? Does anyone have any ideas what might cause it? (I asked this question over on StackOverflow initially, but thought it might be more appropriate here)

    Read the article

  • PBS batch jobs - the qalter command

    - by Ryan Budney
    I've got a giant computation running on a Scientific Linux cluster. At present I have over 600 jobs parked in the queue, waiting for processor time, while a few are running. I'm trying to use the qalter command on some of the idle but scheduled jobs. I'd like to schedule them for a later time, so that other users can jump part of the queue, sort of as an act of politeness. Is this doable? For example, JOBNAME 292399 is currently idle, scheduled to be run whenever a spot in the queue opens up. But if I run qalter -a 10051000 292398 followed by qrerun 292398 I get qrerun: Request invalid for state of job 292398.euler. From the qalter documentation, I thought 10051000 refers to tomorrow (oct 5th, 10am) but perhaps I'm misunderstanding something? If I'm going about this the wrong way, please let me know. The main thing I'm looking for is a command that's easily scriptable, so that I can modify when my queued tasks get run. qalter seems good for those purposes if I can get it working. I'd rather avoid running qdel and re qsubbing the computations, as there's a bookkeeping issue on which tasks to restart (vs which ones not to). I want to avoid that kind of bookkeeping. From googling around I notice some qalter commands have rather different date formats, but the above appears to be correct, as far as I can tell from the man docs. Any help would be appreciated.

    Read the article

  • Pitfalls to using Gluster as a home/profile directory server?

    - by Bart Silverstrim
    I was asking recently about options for divvying up access to file servers, as we have a NAS solution that gets fairly bogged down when our users (with giant profiles, especially) all log in nearly simultaneously. I ran across Gluster and it looks like it can cluster different physical storage media into a single virtual volume and share it out like a virtual NAS from the client perspective and it support CIFS. My question is whether something like this would be feasible to use for home and profile directories in an active directory environment. I was worried about ACL's, primarily, as I didn't think CIFS was fine-grained enough to support NTFS permissions and it didn't look like Gluster exports those permission levels, just the base permissions for basic file sharing. I got the impression that using Gluster would allow for data to be redundant across multiple servers and would speed up access to the files under heavy load, while allowing us to dynamically boost storage capacity by just adding another server and telling Gluster's master node to add that server. Maybe I'm wrong with my understanding of it though. Anyone else use it or care to share how feasible this is?

    Read the article

  • Built local glibc, broke system, how do I ssh without parsing the .bashrc?

    - by Mikhail
    The cluster I am on had really old build tools and I needed to use CUDA5. I'm a pretty clever dude and I planned on building the necissary tools. So, I built a local copy of gcc, bintools, and glibc. Everything a CUDA5 could want. All builds finished without error. and I tested gcc and bintools. Everything was wonderful and I built and ran a few of the programs. I set up the LD_LIBRARY_PATHs in the .bashrc and logged back in, expecting a productive night ahead. To my horror I realized that everything is dynamically linked. Now I can't do simple commands like ls [ex@uid377 ~]$ ls ls: error while loading shared libraries: __vdso_time: invalid mode for dlopen(): Invalid argument and I can't do commands to fix the problem like rm or vim! Is there a way for me to ssh but also to ignore .bashrc file? Any suggestions are much appreciated. This machine is obviously under maintained and I don't know when I could have administrator support.

    Read the article

  • Cannot increase Datastore

    - by k4w4zz
    Hello, We have an ESX 4.0 cluster with 2 hosts, EMC Clarion SAN storage with 10 LUNs. We have added 2 new 400 GB LUNs. All the LUNs are visible from both hosts. I have extended an existing 500 GB datastore with one of these 400 GB LUNs - the new datastore size is now 900 GB. I'd like to do the same operation with the second 400 GB LUN to extend another existing datastore but I'm not able to do it. The LUN is available to create a brand new datastore but is not visible to extend an existing one. I don't understand why everything was fine with the other one and why can't I do the same exact operation with this LUN. The result is the same on both hosts. The SAN admin have erased and re-created several times this LUN. I have rescan the HBA each time. In attachment you can find the result of the esxcfg-mpath -l and fdisk -l commands on both servers. Does somebody have an idea please?

    Read the article

  • Managing an application across multiple servers, or PXE vs cfEngine/Chef/Puppet

    - by matt
    We have an application that is running on a few (5 or so and will grow) boxes. The hardware is identical in all the machines, and ideally the software would be as well. I have been managing them by hand up until now, and don't want to anymore (static ip addresses, disabling all necessary services, installing required packages...) . Can anyone balance the pros and cons of the following options, or suggest something more intelligent? 1: Individually install centos on all the boxes and manage the configs with chef/cfengine/puppet. This would be good, as I have wanted an excuse to learn to use one of applications, but I don't know if this is actually the best solution. 2: Make one box perfect and image it. Serve the image over PXE and whenever I want to make modifications, I can just reboot the boxes from a new image. How do cluster guys normally handle things like having mac addresses in the /etc/sysconfig/network-scripts/ifcfg* files? We use infiniband as well, and it also refuses to start if the hwaddr is wrong. Can these be correctly generated at boot? I'm leaning towards the PXE solution, but I think monitoring with munin or nagios will be a little more complicated with this. Anyone have experience with this type of problem? All the servers have SSDs in them and are fast and powerful. Thanks, matt.

    Read the article

  • Determining physical location of data on a disc

    - by Synetech
    Does anybody know of a way to find out where, physically on a CD or DVD a given piece of data would be located? I am trying to watch a DVD at the moment, and am about half-way through, but it keeps dying at a specific spot in the film, presumably because of a scratch. I have a repair kit, but I don’t know where to focus my repair because there are several scuffs and scratches on the disc and I have no way of knowing which one is causing the issue. Obviously, cleaning all of them is inadvisable because not only does it waste the consumable materials in the kit, but not all of them are a problem, and by working them, some may become unreadable. Moreover, just because I am half-way through the movie does not mean that it would be half-way from the hub to the edge for several reasons: Discs have more data towards the outer edge than the inner edge (circles are more mathematically complicated than rectangles) The disc is not completely filled up (and even if it were, the movie itself would be be using it all, there are extras and such) Because in this particular case it is a commercial DVD, it is also dual-layer which further complicates manual determination As such, I am trying to find a program that can let me identify a file (or part thereof), cluster, etc. and show me a picture of where on the CD/DVD it would be located. That way, I can look at the disc and fix any scratches that correspond to that distance from the hub. For example, the image below might indicate where on a disc a couple of files or range of clusters would be located, so by looking for anomalies in those areas (rotating as necessary), the correct one can be identified. I’m sure it can be done since at least one form of copy protection (DPM) uses it and DVD-lab Pro includes a “DVD Topology” feature to do this.

    Read the article

  • thought about shared storage (NFS, Lustre) [closed]

    - by user134880
    Possible Duplicate: Can you help me with my capacity planning? Now I habe small cluster with total of 8 nodes. 6 of them are computing nodes (apache and vmware) and 2 nodes are for storage. 2 storage nodes are identical. Each storage server is linux box with 8 x 1Tb WD RE4 in soft raid 10. 1st box is master and 2nd is slave. Data is mirrored with DRDB. We export NFSv4 shares to Apache (for document root) and iSCSI to Vmware. Now all is working pretty good and stable. But it will be soon time to upgrade our system. I have been thinking of Lustre. Does some one has any real experience with Lustre or NFS medium clusters? Will it be good idea just to upgrade server and change hdd's to 3Tb ? With NFS we will always have only 2 servers to maintain (one primary and one slave). Thanks. QUESTIONS: 1) Does some one used Lustre? In production? I have seen a lot of info about how it is hard to setup Lustre because you need to compile own kernel and patches. It's answers from newbies. Is there some one who has used Lustre for some period of time? 2) About disk upgrades - it's only description of strategy. I'm not asking if it is enough 3Tb or not. I just ask if it is right just to replace hdds instead of adding new server (like with Lustre) Thanks again.

    Read the article

  • What applications can be used in a Red Hat/CentOS cluster?

    - by Sandra
    Hi, When I look at the Red Hat cluster manuals 1 2, they only explain how to install it but not what applications can use it. I am new to clusters, so I don't know these things =) Let's say I want to 3 node high performance cluster; What applications would work with it? Also, how does an application talk to the cluster? Does the application need to have been written to support clusters? Sandra

    Read the article

  • Setting up an Active-Active IIS Cluster with ARR - is it possible?

    - by Ahmed Zubair
    I would like to know if we can setup an Active-Active IIS Cluster using Windows Cluster services that shares a common storage to store web content and WITHOUT the use of Windows NLB. I'm aware that this may not be a best practice or not a recommended setup, however, the setup is to be configured as below: Two web servers running IIS 7.5 (needs a common storage for web content) for HA and another set of two servers for sql cluster in active-passive mode for HA. Also is it possible to enable ARR on 2 node active-active IIS cluster for load balancing http requests? Appreciate if someone replies with both pros & cons of the setup.

    Read the article

  • How can I format an SD card with a more robust Linux-usable filesystem with a specific cluster size for better write performace?

    - by Harvey
    Goal: microSD card formatted... for best write performance for use only with embedded Linux for better reliability (random power failures may occur) using an 64kB cluster size I'm using an 8GB microSD card for data storage inside an embedded Linux/ARM device. The SD card is not removable. I've been using ext3 instead of the pre-installed FAT32 because it seems to better handle random power failures during writes. However, I kept noticing that my write performance is always best with the pre-installed FAT32 from Kingston. If I reformat the card with FAT32, the performance still suffers. After browsing wikipedia, I stumbled upon the following comment saying that some cards are optimized for specific cluster sizes. In my case, the Kingston comes pre-formatted for an 64kB cluster size. Risks of reformatting Reformatting an SD card with a different file system, or even with the same one, may make the card slower, or shorten its lifespan. Some cards use wear leveling, in which frequently modified blocks are mapped to different portions of memory at different times, and some wear-leveling algorithms are designed for the access patterns typical of the file allocation table on a FAT16 or FAT32 device.[60] In addition, the preformatted file system may use a cluster size that matches the erase region of the physical memory on the card; reformatting may change the cluster size and make writes less efficient.

    Read the article

  • Need to reformat SQL Cluster Disk. How do I recover my SQL installation?

    - by I.T. Support
    We need to reformat the SQL cluster disk in our SQL cluster. The drive contains the shared installation files for SQL as well as databases. My concern is how SQL/The Cluster will react to after we wipe the disk resource. Questions: Is there a defined procedure for this? How should we backup and restore the disk? After the reformat, how do we get the clustered SQL server back online? Thanks

    Read the article

  • Fortran recursion segmentation faults

    - by ConnorG
    Hey all - I have to design and implement a Fortran routine to determine the size of clusters on a square lattice, and it seemed extremely convenient to code the subroutine recursively. However, whenever my lattice size grows beyond a certain value (around 200/side), the subroutine consistently segfaults. Here's my cluster-detection routine: RECURSIVE SUBROUTINE growCluster(lattice, adj, idx, area) INTEGER, INTENT(INOUT) :: lattice(:), area INTEGER, INTENT(IN) :: adj(:,:), idx lattice(idx) = -1 area = area + 1 IF (lattice(adj(1,idx)).GT.0) & CALL growCluster(lattice,adj,adj(1,idx),area) IF (lattice(adj(2,idx)).GT.0) & CALL growCluster(lattice,adj,adj(2,idx),area) IF (lattice(adj(3,idx)).GT.0) & CALL growCluster(lattice,adj,adj(3,idx),area) IF (lattice(adj(4,idx)).GT.0) & CALL growCluster(lattice,adj,adj(4,idx),area) END SUBROUTINE growCluster where adj(1,n) represents the north neighbor of site n, adj(2,n) represents the west and so on. What would cause the erratic segfault behavior? Is the cluster just "too huge" for large lattice sizes?

    Read the article

  • IP failover with 2 nodes on different subnet: cannot ping virtual IP from second node?

    - by quanta
    I'm going to setup redundant failover Redmine: another instance was installed on the second server without problem MySQL (running on the same machine with Redmine) was configured as master-master replication Because they are in different subnet (192.168.3.x and 192.168.6.x), it seems that VIPArip is the only choice. /etc/ha.d/ha.cf on node1 logfacility none debug 1 debugfile /var/log/ha-debug logfile /var/log/ha-log autojoin none warntime 3 deadtime 6 initdead 60 udpport 694 ucast eth1 node2.ip keepalive 1 node node1 node node2 crm respawn /etc/ha.d/ha.cf on node2: logfacility none debug 1 debugfile /var/log/ha-debug logfile /var/log/ha-log autojoin none warntime 3 deadtime 6 initdead 60 udpport 694 ucast eth0 node1.ip keepalive 1 node node1 node node2 crm respawn crm configure show: node $id="6c27077e-d718-4c82-b307-7dccaa027a72" node1 node $id="740d0726-e91d-40ed-9dc0-2368214a1f56" node2 primitive VIPArip ocf:heartbeat:VIPArip \ params ip="192.168.6.8" nic="lo:0" \ op start interval="0" timeout="20s" \ op monitor interval="5s" timeout="20s" depth="0" \ op stop interval="0" timeout="20s" \ meta is-managed="true" property $id="cib-bootstrap-options" \ stonith-enabled="false" \ dc-version="1.0.12-unknown" \ cluster-infrastructure="Heartbeat" \ last-lrm-refresh="1338870303" crm_mon -1: ============ Last updated: Tue Jun 5 18:36:42 2012 Stack: Heartbeat Current DC: node2 (740d0726-e91d-40ed-9dc0-2368214a1f56) - partition with quorum Version: 1.0.12-unknown 2 Nodes configured, unknown expected votes 1 Resources configured. ============ Online: [ node1 node2 ] VIPArip (ocf::heartbeat:VIPArip): Started node1 ip addr show lo: 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo inet 192.168.6.8/32 scope global lo inet6 ::1/128 scope host valid_lft forever preferred_lft forever I can ping 192.168.6.8 from node1 (192.168.3.x): # ping -c 4 192.168.6.8 PING 192.168.6.8 (192.168.6.8) 56(84) bytes of data. 64 bytes from 192.168.6.8: icmp_seq=1 ttl=64 time=0.062 ms 64 bytes from 192.168.6.8: icmp_seq=2 ttl=64 time=0.046 ms 64 bytes from 192.168.6.8: icmp_seq=3 ttl=64 time=0.059 ms 64 bytes from 192.168.6.8: icmp_seq=4 ttl=64 time=0.071 ms --- 192.168.6.8 ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 3000ms rtt min/avg/max/mdev = 0.046/0.059/0.071/0.011 ms but cannot ping virtual IP from node2 (192.168.6.x) and outside. Did I miss something? PS: you probably want to set IP2UTIL=/sbin/ip in the /usr/lib/ocf/resource.d/heartbeat/VIPArip resource agent script if you get something like this: Jun 5 11:08:10 node1 lrmd: [19832]: info: RA output: (VIPArip:stop:stderr) 2012/06/05_11:08:10 ERROR: Invalid OCF_RESK EY_ip [192.168.6.8] http://www.clusterlabs.org/wiki/Debugging_Resource_Failures Reply to @DukeLion: Which router receives RIP updates? When I start the VIPArip resource, ripd was run with below configuration file (on node1): /var/run/resource-agents/VIPArip-ripd.conf: hostname ripd password zebra debug rip events debug rip packet debug rip zebra log file /var/log/quagga/quagga.log router rip !nic_tag no passive-interface lo:0 network lo:0 distribute-list private out lo:0 distribute-list private in lo:0 !metric_tag redistribute connected metric 3 !ip_tag access-list private permit 192.168.6.8/32 access-list private deny any

    Read the article

< Previous Page | 30 31 32 33 34 35 36 37 38 39 40 41  | Next Page >