cluster computing - Page 32

Les personnels IT pas assez qualifiés ? Oui, pour 93% des employeurs d'après un rapport de la Computing Technology Industry Association

Les employés en informatique pas assez qualifiés pour 93% des employeurs Selon un rapport établi par la Computing Technology Industry Association La Computing Technology Industry Association, CompTIA, est un organisme à but non lucratif fondé en 1982 par 5 professionnels de l'industrie informatique. L'association s'est intéressée dans son dernier rapport aux employés travaillant dans le domaine des technologies de l'information et de la communication. Pour ce faire, elle a interrogé 500 patrons et responsables informatiques de grandes, moyennes et petites entreprises sur les compétences de leurs employés en informatique. Des employeurs du Canada, du Japon, de l'Afrique du ...

Read the article

Using WKA in Large Coherence Clusters (Disabling Multicast)

- by jpurdy

Disabling hardware multicast (by configuring well-known addresses aka WKA) will place significant stress on the network. For messages that must be sent to multiple servers, rather than having a server send a single packet to the switch and having the switch broadcast that packet to the rest of the cluster, the server must send a packet to each of the other servers. While hardware varies significantly, consider that a server with a single gigabit connection can send at most ~70,000 packets per second. To continue with some concrete numbers, in a cluster with 500 members, that means that each server can send at most 140 cluster-wide messages per second. And if there are 10 cluster members on each physical machine, that number shrinks to 14 cluster-wide messages per second (or with only mild hyperbole, roughly zero). It is also important to keep in mind that network I/O is not only expensive in terms of the network itself, but also the consumption of CPU required to send (or receive) a message (due to things like copying the packet bytes, processing a interrupt, etc). Fortunately, Coherence is designed to rely primarily on point-to-point messages, but there are some features that are inherently one-to-many: Announcing the arrival or departure of a member Updating partition assignment maps across the cluster Creating or destroying a NamedCache Invalidating a cache entry from a large number of client-side near caches Distributing a filter-based request across the full set of cache servers (e.g. queries, aggregators and entry processors) Invoking clear() on a NamedCache The first few of these are operations that are primarily routed through a single senior member, and also occur infrequently, so they usually are not a primary consideration. There are cases, however, where the load from introducing new members can be substantial (to the point of destabilizing the cluster). Consider the case where cluster in the first paragraph grows from 500 members to 1000 members (holding the number of physical machines constant). During this period, there will be 500 new member introductions, each of which may consist of several cluster-wide operations (for the cluster membership itself as well as the partitioned cache services, replicated cache services, invocation services, management services, etc). Note that all of these introductions will route through that one senior member, which is sharing its network bandwidth with several other members (which will be communicating to a lesser degree with other members throughout this process). While each service may have a distinct senior member, there's a good chance during initial startup that a single member will be the senior for all services (if those services start on the senior before the second member joins the cluster). It's obvious that this could cause CPU and/or network starvation. In the current release of Coherence (3.7.1.3 as of this writing), the pure unicast code path also has less sophisticated flow-control for cluster-wide messages (compared to the multicast-enabled code path), which may also result in significant heap consumption on the senior member's JVM (from the message backlog). This is almost never a problem in practice, but with sufficient CPU or network starvation, it could become critical. For the non-operational concerns (near caches, queries, etc), the application itself will determine how much load is placed on the cluster. Applications intended for deployment in a pure unicast environment should be careful to avoid excessive dependence on these features. Even in an environment with multicast support, these operations may scale poorly since even with a constant request rate, the underlying workload will increase at roughly the same rate as the underlying resources are added. Unless there is an infrastructural requirement to the contrary, multicast should be enabled. If it can't be enabled, care should be taken to ensure the added overhead doesn't lead to performance or stability issues. This is particularly crucial in large clusters.

Read the article

Which process is using my NAS?

- by sethu

I have a nas connected to my cluster. The NAS holds all our home directories. When I did a set of experiments last week, saving a 1 GB file to the nas took around 30 seconds. If i do the same to a local disk it takes 18 seconds. But when I tried doing the same process today, it takes 150 seconds. I am unsure what is the problem . Can someone help me pointout the issue? Is it possible to find out which process is accessing the NAS or how much NAS bandwidth is getting used ? Thanks for your help. -Sethu

Read the article

coordinating a script to run on only one of identical load-balanced servers

- by Amos Shapira

I have two identically configured CentOS 5 servers (possibly more in the future). I need to run a cron job on any one of them and that it'll run only on one of them. I know about RedHat Cluster Suite (we use it on other servers), but it's a too big a gun to use for this task, plus it doesn't really behave well for less than three nodes. Is there anything light-weight I can use for that? The servers can communicate with each other directly. I suppose I can develope something over ssh or nrpe (two server which are already installed on these servers), but I was wondering whether there is something already available.

Read the article

Linux HA - Best Heartbeat hardware solution

- by Martino Dino

Hi all I would ask anyone what is the best layer 2 medium for heartbeat in Linux and how it's best configured. More precisely I've been thinking about a dedicated NIC for that purpose but then i thought that if a switch breaks then i would loose the heartbeat connection for most of the cluster and STONITH 'BUM'!!! Will probably loose my job after :) Distributing the heartbeat onto the main NICs of every node trough a vif sounds reasonable but im not sure if this is the best option (at least the switches are redundant to some extent). Is it possible to use heartbeat over a bonded interface and that sounds reasonable? Do you have any other tip/solution for that issue?

Read the article

What are the challenges when my enterprise desires to move the processing component of an applicatio

- by Berkay

Assume that i have an enterprise accounting application that consists of a front-end interface, a processing tier, and a back-end database. This is an application that contains private business data, and thus is traditionally run in a secure private network environment within the enterprise. What are the challenges that appear when my enterprise desires to move the processing component of this application to a cloud computing data center in order to achieve greater scalability or to reduce IT costs ? Pls note: do i have to make significant changes to my own infrastructure to enable external access to formerly private resources? do i have to modify the application code to handle new network topology ? thanks, if you give your answers in a simple manner, really appreciated.

Read the article

diskpart on RDM's ...

- by karnash

HI, We have ESXi cluster which is attached to clariion CX4 We have windows 2008 R2 as the guest OS. Attahed to this vm is 2 x 1.95T RDM's I select disk 1 create partition primary size=1 (1MB) then list partition Partition ### Type Size Offset * Partition 1 Primary 1024 KB 1024 KB Then I do the same for the other disk and offset is 1024KB I need to present 4T disk to this vm so I right click on disk 1 convert to simple volume then extend it by adding the second disk now when I do list partition, I see the off set is set to 31k. Can anyone please guide me. Thanks

Read the article

Upgrading drives on a MD3000

- by Anonymouse

Hello, Our MD3000 array is getting full as our databases are growing and we need more spaces. Currently, we use a MD3000 with a two-servers Windows 2003 cluster and 15x 73GB SAS drives. Disk groups are configured in RAID1 of two drives. The approach we are currently investigating is simply swapping the existing SAS drives with bigger ones (300GB instead of 73GB), one at a time, and let each RAID1 array rebuilt. Is it a good approach? Will we be able to resize the array afterwards? Will we be able to resize the partitions afterwards? Can the Dell M3000 Management software do it or will we have to bring the server offline and use some partition software to do it? Thanks in advance.

Read the article

How to make a DHCP server on virtual machine serves other virtual machines(on different physical machines)?

- by Tony

I'm building a virtual cluster with VirtualBox and Opensuse. I have 10 physical machines and need several vms on each. The virtual machines are supposed to be in a "private" network, but still have internet access. I was asked to set up a virtual head node working as DHCP server. I installed DHCP server on the virtual head node and it seems works. On VirtualBox I set 2 network adapters to the head node, one bridged adapter and one internal network. One vm on the same physical machine has been set nic as internal network adapter. The vm can get IP address (so DHCP works) but can't access internet. What should I do? Specifically, what network adapter should I choose for head-node and work-nodes in VirtualBox? What in the virtual machines should I do?

Read the article

Real-time threat finder

- by Rohit

I want to make a small program that is capable to download files from the cloud onto my system. As the file reaches my system, another program on my system will analyze the file and try to find suspicious behaviors in it. I want to make a system similar to ThreatExpert (www.threatexpert.com). The suspicious data gathered by my program will be sent to Anti-Virus companies for analysis. I want to know whether this program can be written in .NET or as a PHP website. I have no experience of Cloud computing. How to retrieve files from the cloud?

Read the article

Decent 1gb switch (16-24 port) for rack...

- by TomTom

Hallo, for a rack containing a smaller nubmer of servers (5 at the moment, going to stay in this area), I look to replace the currently aging 100mbit switch with a 1gb switch. This is for the backend between the servers. I expect some ISCIS traffic there ,so a 10gbit option would be nice (preferably for two ports, as extension modules). I dont need management, this is a pure backend of an internal cluster. I do VLAN, but there is no sensible management the switch can do there. I wuold like: * 1he only, obviously * preferable limited moving parts. * Low price ;) * Enough power to run at least half the ports in full speed at the same time. Anyone any recommendations?

Read the article

How to broadcast a command on Windows

- by Xiao Jia

I am going to frequently deploy different versions of a program on a cluster of Windows machines (mostly Windows XP), so I am willing to use a command-line broadcasting tool (either built-in or 3rd-party) to (1) download a file from some URL, and (2) execute the same command, on all the machines. I googled for a very long time but got nothing related to my goal. (Only pages about broadcasting a message, broadcasting ping, or programmatically broadcast via TCP/IP, etc.) Are there any tool for this purpose? Or is it possible to do it pragmatically (without installing extra client programs on those machines)?

Read the article

Setup local EC2 style cloud?

- by John Kramlich

I was recently given 3 dual opteron 2400 servers with 4GB of RAM and 120GB hard drives. I am interested in setting up something similar to Amazon's EC2 for my own personal web development use. Basically, I would like to spin up instances from an ISO or other disk images and have them available to test and develop software. Are there open source solutions I can use to accomplish this? I am assuming one of the machines will need to act as a controller of some sort for the other two. I use Sun's VirtualBox on my local development machine to virtualize various versions of Microsoft Windows. However, I'm not sure if that's the best tool for what I am trying to achieve. I apologize in advance if this question is to vague to get meaningful responses. I am new to cloud computing and fairly new at server administration.

Read the article

How can one domain route to an always-changing pool of servers?

- by ryeguy

I'm sure this is an easy solution, I'm just not too familiar with how DNS works or if that's even related to this problem. If I'm running a web service on amazon ec2, distributed across many instances, how can I make it so a single domain name can be used to access the entire pool of servers, which will be changing from time to time? Since the instances may be present one second but gone the next (and vice versa), I need a way to randomly pick an active member of the cluster to route to. The updates would have to be instantaneous. Is this even possible, with dns caching and all?

Read the article

Cloud hosting and single hardware point of failure?

- by PeterB

From talking to sales I thought Rackspace Cloud was running on a SAN and compute nodes (as VMWare's offerings do), only to find out it doesn't, so when the host server goes down for maintenance all cloud servers on the server go down (in our case for 2.5 hours). I understand Amazon EC2 also has this single-server point of failure. Which cloud hosting solutions don't rely on a single server? I've yet to find a list by architecture Is there a term that distinguishes between these types of 'cloud'? Is one of these 'grid computing' and the other 'virtualisation'? Can a SAN backed solution provide the same reliability as 2 mirrored cloud servers on (say) Rackspace Cloud? I am more familiar with the VMWare architecture and would like to understand the advantages and disadvantages of each approach. I understand the standard architecture is to have multiple cloud servers and mirrored data between them; until we need multiple database servers I'm wondering if a SAN/node hosting solution would provide the lack of downtime we need without the added complexity.

Read the article

Configuring MPI on 2 nodes

- by Wysek

I'm trying to create really simple "cluster" from 2 multicore computers using openmpi. My problem is that I can't find any tutorials on that matter. I don't want to use torque because it's not necessary in my case nevertheless all tutorials give configuration details either about torque or mpd (which doesn't exist in openmpi implementation). Could you give me some tips or links to appropriate manuals? Steps I've already completed: - openmpi installation - network configuration (computers see each other) - ssh password-less login to second computer I tried using machinefiles without further configuration and with just 2 IPs in it. But jobs don't seem to start at all after initialization part. (MPI seems to work because I'm able to scatter jobs on multiple cores of both computers without communication between them).

Read the article

HDFS some datanodes of cluster are suddenly disconnected while reducers are running

- by user1429825

I have 8 slave computers and 1 master computer for running Hadoop (ver 0.21) some datanodes of cluster are suddenly disconnected while I was running MapReduce code on 10GB data After all mappers finished and around 80% of reducers was processed, randomly one or more datanode disconned from network. and then the other datanodes start to disappear from network even if I killed the MapReduce job when I found some datanode was disconnected. I've tried to change dfs.datanode.max.xcievers to 4096, turned off fire-walls of all computing node, disabled selinux and increased the number of file open limit to 20000 but they didn't work at all... anyone have a idea to solve this problem? followings are error log from mapreduce 12/06/01 12:31:29 INFO mapreduce.Job: Task Id : attempt_201206011227_0001_r_000006_0, Status : FAILED java.io.IOException: Bad connect ack with firstBadLink as ***.***.***.148:20010 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:889) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:820) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:427) and followings are logs from datanode 2012-06-01 13:01:01,118 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-5549263231281364844_3453 src: /*.*.*.147:56205 dest: /*.*.*.142:20010 2012-06-01 13:01:01,136 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(*.*.*.142:20010, storageID=DS-1534489105-*.*.*.142-20010-1337757934836, infoPort=20075, ipcPort=20020) Starting thread to transfer block blk_-3849519151985279385_5906 to *.*.*.147:20010 2012-06-01 13:01:19,135 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(*.*.*.142:20010, storageID=DS-1534489105-*.*.*.142-20010-1337757934836, infoPort=20075, ipcPort=20020):Failed to transfer blk_-5797481564121417802_3453 to *.*.*.146:20010 got java.net.ConnectException: > Connection timed out at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:373) at org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:1257) at java.lang.Thread.run(Thread.java:722) 2012-06-01 13:06:20,342 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_6674438989226364081_3453 2012-06-01 13:09:01,781 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(*.*.*.142:20010, storageID=DS-1534489105-*.*.*.142-20010-1337757934836, infoPort=20075, ipcPort=20020):Failed to transfer blk_-3849519151985279385_5906 to *.*.*.147:20010 got java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/*.*.*.142:60057 remote=/*.*.*.147:20010] at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:164) at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:203) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:388) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:476) at org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:1284) at java.lang.Thread.run(Thread.java:722) hdfs-site.xml <configuration> <property> <name>dfs.name.dir</name> <value>/home/hadoop/data/name</value> </property> <property> <name>dfs.data.dir</name> <value>/home/hadoop/data/hdfs1,/home/hadoop/data/hdfs2,/home/hadoop/data/hdfs3,/home/hadoop/data/hdfs4,/home/hadoop/data/hdfs5</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.datanode.max.xcievers</name> <value>4096</value> </property> <property> <name>dfs.http.address</name> <value>0.0.0.0:20070</value> <description>50070 The address and the base port where the dfs namenode web ui will listen on. If the port is 0 then the server will start on a free port. </description> </property> <property> <name>dfs.datanode.http.address</name> <value>0.0.0.0:20075</value> <description>50075 The datanode http server address and port. If the port is 0 then the server will start on a free port. </description> </property> <property> <name>dfs.secondary.http.address</name> <value>0.0.0.0:20090</value> <description>50090 The secondary namenode http server address and port. If the port is 0 then the server will start on a free port. </description> </property> <property> <name>dfs.datanode.address</name> <value>0.0.0.0:20010</value> <description>50010 The address where the datanode server will listen to. If the port is 0 then the server will start on a free port. </description> <property> <name>dfs.datanode.ipc.address</name> <value>0.0.0.0:20020</value> <description>50020 The datanode ipc server address and port. If the port is 0 then the server will start on a free port. </description> </property> <property> <name>dfs.datanode.https.address</name> <value>0.0.0.0:20475</value> </property> <property> <name>dfs.https.address</name> <value>0.0.0.0:20470</value> </property> </configuration> mapred-site.xml <configuration> <property> <name>mapred.job.tracker</name> <value>masternode:29001</value> </property> <property> <name>mapred.system.dir</name> <value>/home/hadoop/data/mapreduce/system</value> </property> <property> <name>mapred.local.dir</name> <value>/home/hadoop/data/mapreduce/local</value> </property> <property> <name>mapred.map.tasks</name> <value>32</value> <description> default number of map tasks per job.</description> </property> <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>4</value> </property> <property> <name>mapred.reduce.tasks</name> <value>8</value> <description> default number of reduce tasks per job.</description> </property> <property> <name>mapred.map.child.java.opts</name> <value>-Xmx2048M</value> </property> <property> <name>io.sort.mb</name> <value>500</value> </property> <property> <name>mapred.task.timeout</name> <value>1800000</value>  </property> <property> <name>mapred.job.tracker.http.address</name> <value>0.0.0.0:20030</value> <description> 50030 The job tracker http server address and port the server will listen on. If the port is 0 then the server will start on a free port. </description> </property> <property> <name>mapred.task.tracker.http.address</name> <value>0.0.0.0:20060</value> <description> 50060 </property> </configuration>

Read the article

Sending USR2 to mongrel_rails sometimes results in an “Address already in use” on the restart

- by Ben

We have a rolling-restart mode for our mongrel cluster that sends a USR2 signal to each running process. This works great, most of the time. But very occasionally the mongrel process will shutdown, and then fail to restart, with the following error: /usr/local/lib/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/tcphack.rb:12:in `initialize_without_backlog': Address already in use - bind(2) (Errno::EADDRINUSE) from /usr/local/lib/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/tcphack.rb:12:in `initialize' from /usr/local/lib/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:93:in `new' from /usr/local/lib/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:93:in `initialize' from /usr/local/lib/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/configurator.rb:139:in `new' from /usr/local/lib/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/configurator.rb:139:in `listener' Looking though the mongrel source, the USR2 handler calls a synchronous stop on the running server, so it ought to block until the socket has been released. Has anyone seen this error? Does anyone have any ideas what might cause it? (I asked this question over on StackOverflow initially, but thought it might be more appropriate here)

Read the article

PBS batch jobs - the qalter command

- by Ryan Budney

I've got a giant computation running on a Scientific Linux cluster. At present I have over 600 jobs parked in the queue, waiting for processor time, while a few are running. I'm trying to use the qalter command on some of the idle but scheduled jobs. I'd like to schedule them for a later time, so that other users can jump part of the queue, sort of as an act of politeness. Is this doable? For example, JOBNAME 292399 is currently idle, scheduled to be run whenever a spot in the queue opens up. But if I run qalter -a 10051000 292398 followed by qrerun 292398 I get qrerun: Request invalid for state of job 292398.euler. From the qalter documentation, I thought 10051000 refers to tomorrow (oct 5th, 10am) but perhaps I'm misunderstanding something? If I'm going about this the wrong way, please let me know. The main thing I'm looking for is a command that's easily scriptable, so that I can modify when my queued tasks get run. qalter seems good for those purposes if I can get it working. I'd rather avoid running qdel and re qsubbing the computations, as there's a bookkeeping issue on which tasks to restart (vs which ones not to). I want to avoid that kind of bookkeeping. From googling around I notice some qalter commands have rather different date formats, but the above appears to be correct, as far as I can tell from the man docs. Any help would be appreciated.

Read the article

Pitfalls to using Gluster as a home/profile directory server?

- by Bart Silverstrim

I was asking recently about options for divvying up access to file servers, as we have a NAS solution that gets fairly bogged down when our users (with giant profiles, especially) all log in nearly simultaneously. I ran across Gluster and it looks like it can cluster different physical storage media into a single virtual volume and share it out like a virtual NAS from the client perspective and it support CIFS. My question is whether something like this would be feasible to use for home and profile directories in an active directory environment. I was worried about ACL's, primarily, as I didn't think CIFS was fine-grained enough to support NTFS permissions and it didn't look like Gluster exports those permission levels, just the base permissions for basic file sharing. I got the impression that using Gluster would allow for data to be redundant across multiple servers and would speed up access to the files under heavy load, while allowing us to dynamically boost storage capacity by just adding another server and telling Gluster's master node to add that server. Maybe I'm wrong with my understanding of it though. Anyone else use it or care to share how feasible this is?

Read the article

Built local glibc, broke system, how do I ssh without parsing the .bashrc?

- by Mikhail

The cluster I am on had really old build tools and I needed to use CUDA5. I'm a pretty clever dude and I planned on building the necissary tools. So, I built a local copy of gcc, bintools, and glibc. Everything a CUDA5 could want. All builds finished without error. and I tested gcc and bintools. Everything was wonderful and I built and ran a few of the programs. I set up the LD_LIBRARY_PATHs in the .bashrc and logged back in, expecting a productive night ahead. To my horror I realized that everything is dynamically linked. Now I can't do simple commands like ls [ex@uid377 ~]$ ls ls: error while loading shared libraries: __vdso_time: invalid mode for dlopen(): Invalid argument and I can't do commands to fix the problem like rm or vim! Is there a way for me to ssh but also to ignore .bashrc file? Any suggestions are much appreciated. This machine is obviously under maintained and I don't know when I could have administrator support.

Read the article

Cannot increase Datastore

- by k4w4zz

Hello, We have an ESX 4.0 cluster with 2 hosts, EMC Clarion SAN storage with 10 LUNs. We have added 2 new 400 GB LUNs. All the LUNs are visible from both hosts. I have extended an existing 500 GB datastore with one of these 400 GB LUNs - the new datastore size is now 900 GB. I'd like to do the same operation with the second 400 GB LUN to extend another existing datastore but I'm not able to do it. The LUN is available to create a brand new datastore but is not visible to extend an existing one. I don't understand why everything was fine with the other one and why can't I do the same exact operation with this LUN. The result is the same on both hosts. The SAN admin have erased and re-created several times this LUN. I have rescan the HBA each time. In attachment you can find the result of the esxcfg-mpath -l and fdisk -l commands on both servers. Does somebody have an idea please?

Read the article

Managing an application across multiple servers, or PXE vs cfEngine/Chef/Puppet

- by matt

We have an application that is running on a few (5 or so and will grow) boxes. The hardware is identical in all the machines, and ideally the software would be as well. I have been managing them by hand up until now, and don't want to anymore (static ip addresses, disabling all necessary services, installing required packages...) . Can anyone balance the pros and cons of the following options, or suggest something more intelligent? 1: Individually install centos on all the boxes and manage the configs with chef/cfengine/puppet. This would be good, as I have wanted an excuse to learn to use one of applications, but I don't know if this is actually the best solution. 2: Make one box perfect and image it. Serve the image over PXE and whenever I want to make modifications, I can just reboot the boxes from a new image. How do cluster guys normally handle things like having mac addresses in the /etc/sysconfig/network-scripts/ifcfg* files? We use infiniband as well, and it also refuses to start if the hwaddr is wrong. Can these be correctly generated at boot? I'm leaning towards the PXE solution, but I think monitoring with munin or nagios will be a little more complicated with this. Anyone have experience with this type of problem? All the servers have SSDs in them and are fast and powerful. Thanks, matt.

Read the article

Determining physical location of data on a disc

- by Synetech

Does anybody know of a way to find out where, physically on a CD or DVD a given piece of data would be located? I am trying to watch a DVD at the moment, and am about half-way through, but it keeps dying at a specific spot in the film, presumably because of a scratch. I have a repair kit, but I don’t know where to focus my repair because there are several scuffs and scratches on the disc and I have no way of knowing which one is causing the issue. Obviously, cleaning all of them is inadvisable because not only does it waste the consumable materials in the kit, but not all of them are a problem, and by working them, some may become unreadable. Moreover, just because I am half-way through the movie does not mean that it would be half-way from the hub to the edge for several reasons: Discs have more data towards the outer edge than the inner edge (circles are more mathematically complicated than rectangles) The disc is not completely filled up (and even if it were, the movie itself would be be using it all, there are extras and such) Because in this particular case it is a commercial DVD, it is also dual-layer which further complicates manual determination As such, I am trying to find a program that can let me identify a file (or part thereof), cluster, etc. and show me a picture of where on the CD/DVD it would be located. That way, I can look at the disc and fix any scratches that correspond to that distance from the hub. For example, the image below might indicate where on a disc a couple of files or range of clusters would be located, so by looking for anomalies in those areas (rotating as necessary), the correct one can be identified. I’m sure it can be done since at least one form of copy protection (DPM) uses it and DVD-lab Pro includes a “DVD Topology” feature to do this.

Read the article

thought about shared storage (NFS, Lustre) [closed]

- by user134880

Possible Duplicate: Can you help me with my capacity planning? Now I habe small cluster with total of 8 nodes. 6 of them are computing nodes (apache and vmware) and 2 nodes are for storage. 2 storage nodes are identical. Each storage server is linux box with 8 x 1Tb WD RE4 in soft raid 10. 1st box is master and 2nd is slave. Data is mirrored with DRDB. We export NFSv4 shares to Apache (for document root) and iSCSI to Vmware. Now all is working pretty good and stable. But it will be soon time to upgrade our system. I have been thinking of Lustre. Does some one has any real experience with Lustre or NFS medium clusters? Will it be good idea just to upgrade server and change hdd's to 3Tb ? With NFS we will always have only 2 servers to maintain (one primary and one slave). Thanks. QUESTIONS: 1) Does some one used Lustre? In production? I have seen a lot of info about how it is hard to setup Lustre because you need to compile own kernel and patches. It's answers from newbies. Is there some one who has used Lustre for some period of time? 2) About disk upgrades - it's only description of strategy. I'm not asking if it is enough 3Tb or not. I just ask if it is right just to replace hdds instead of adding new server (like with Lustre) Thanks again.

Search Results

Search found 3589 results on 144 pages for 'cluster computing'.

Page 32/144 | < Previous Page | 28 29 30 31 32 33 34 35 36 37 38 39 | Next Page >

- by jpurdy

- by sethu

- by Amos Shapira

- by Martino Dino

- by Berkay

- by karnash

- by Anonymouse

- by Tony

- by Rohit

- by TomTom

- by Xiao Jia

- by John Kramlich

- by ryeguy

- by PeterB

- by Wysek

- by user1429825

- by Ben

- by Ryan Budney

- by Bart Silverstrim

- by Mikhail

- by k4w4zz

- by matt

- by Synetech

- by user134880

< Previous Page | 28 29 30 31 32 33 34 35 36 37 38 39 | Next Page >