Search Results

Search found 3295 results on 132 pages for 'solaris cluster'.

Page 39/132 | < Previous Page | 35 36 37 38 39 40 41 42 43 44 45 46 | Next Page >

Connecting XRAID to Sunfire via LSI FC card - Drives not showing up

- by Matthew Watson

Hi, I've got a Sunfire T1000 machine ( Solaris 10 10/09 s10s_u8wos_08a SPARC ) with a LSI7404EP-LC fibre channel card in it. This is plugged into an XRAID. The system seems to have picked up the card > /usr/platform/`uname -i `/sbin/prtdiag IO Location Type Slot Path Name Model ----------- ----- ---- --------------------------------------------- ------------------------- --------- MB/PCIE0 PCIE 0 /pci@780/fibre-channel fibre-channel MB/PCIE0 PCIE 0 /pci@780/fibre-channel fibre-channel MB/NET0 PCIE MB /pci@7c0/pci@0/network@4 network-pci14e4,1668 MB/NET1 PCIE MB /pci@7c0/pci@0/network@4,1 network-pci14e4,1668 MB/NET2 PCIX MB /pci@7c0/pci@0/pci@8/network@1 network-pci108e,1648 MB/NET3 PCIX MB /pci@7c0/pci@0/pci@8/network@1,1 network-pci108e,1648 MB/PCIX PCIX MB /pci@7c0/pci@0/pci@8/scsi@2 scsi-pci1000,50 LSI,1064 However it doesn't seem to be able to see the xraid attached to it, lsiutil only reports the onboard SAS controller. > /usr/local/bin/lsiutil ~ LSI Logic MPT Configuration Utility, Version 1.62, January 14, 2009 1 MPT Port found Port Name Chip Vendor/Type/Rev MPT Rev Firmware Rev IOC 1. mpt0 LSI Logic SAS1064 A3 105 010a0000 0 Select a device: [1-1 or 0 to quit] I've tried adding the configuration to /kernal/drv/sd.conf and /kernal/drv/ssd.conf as per this thread, however format still cannot see any drives on the xraid. I'm not sure where to go next. Any suggestions? From what I've read..this should pretty much just eb plug it in and they show up in format..

Read the article
Setting my NIC to full duplex

- by David

I am trying to optimize the network speed of my Solaris X86 server, and have discovered that the Cisco 3548 that it is connected to has issues with the NIC in my server. The NIC appears to have not been configured fully, and is coming up 100 half-duplex. The 3548 ports are all set to 100 full. Ideally I'd like to have the server set for 100 full, and have been attempting to configure it using ndd commands. However I have had no results. The following command: -bash-3.00# dladm show-dev rtls0 link: unknown speed: 100 Mbps duplex: unknown The NIC shows up as: pci bus 0x0001 cardnum 0x06 function 0x00: vendor 0x10ec device 0x8139 Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ which should be configurable. I have modified the configuration file from auto config (5) to 100 fdx (4) to no avail. If there is no other choice, I could alter the Cisco 3548 to be 100 half-duplex. However, this solution causes huge performance loss. Currently throughput is about 500Kbps, when it should be around 40Mbps.

Read the article
Why won't this script accept any arguments?

- by Nate Wagar

I'm trying to write an SVN post-commit hook and, strangely, am getting hung up on what should be the easiest part. The Script: set REPO="$1" set REV="$2" set SVNBIN="/opt/CollabNet_Subversion/bin/" set SSHBIN="/usr/bin/ssh" set HOST="staging.domain.net" set timeout=30 set USERNAME="svn-usr" set E_NO_CONNECT=2 set E_WRONG_PASS=3 set E_UNKOWN=25 set CHANGED=`"$SVNBIN"svnlook changed --revision $REV $REPOS` echo "Here are changes: $CHANGED" >> /var/svn/repos/www/logs/testing echo "Command: $0; Repo: $REPO; Rev: $REV; Total: $#" >> /var/svn/repos/www/logs/testing set PROJECT "" Yet when I call it, it doesn't seem to be seeing the arguments I pass to it: /var/svn/repos/www/logs> sudo ../hooks/post-commit /var/svn/repos/www 33 svnlook: missing argument: --revision Type 'svnlook help' for usage. /var/svn/repos/www/logs> cat testing Here are changes: Command: ../hooks/post-commit; Repo: ; Rev: ; Total: 1 This is on a Solaris 10 SPARC box. I'm a bit of a script newbie, but shouldn't this be really easy??

Read the article
ssh without password does not work for some users

- by joshxdr

I have a new RHEL4 Linux box that I am using to copy data to old Solaris 2.6 and RHEL3 Linux boxes with scp. I have found that with the same setup, it works for some users but not for others. For user jane, this works fine: jane@host1$ ssh -v remhost debug1: Next authentication method: publickey debug1: Trying private key: /mnt/home/osborjo/.ssh/identity debug1: Offering public key: /mnt/home/osborjo/.ssh/id_rsa debug1: Server accepts key: pkalg ssh-rsa blen 277 debug1: read PEM private key done: type RSA debug1: Authentication succeeded (publickey). for user jack it does not: jack@host1 ssh -v remhost debug1: Next authentication method: publickey debug1: Trying private key: /mnt/home/oper1/.ssh/identity debug1: Offering public key: /mnt/home/oper1/.ssh/id_rsa debug1: Authentications that can continue: publickey,password,keyboard-interactive I have looked at the permissions for all the keys and files, they look the same. Since I am using home directories mounted by NFS, the keys for both the remote host and the local host are in the same directory. This is how things look for jane: jane@host1$ ls -l $HOME/.ssh -rw-rw-r-- 1 jane operator 394 Jan 27 16:28 authorized_keys -rw------- 1 jane operator 1675 Jan 27 16:27 id_rsa -rw-r--r-- 1 jane operator 394 Jan 27 16:27 id_rsa.pub -rw-rw-r-- 1 jane operator 1205 Jan 27 16:46 known_hosts For user jack: jack@host1$ ls -l $HOME/.ssh -rw-rw-r-- 1 jack engineer 394 Jan 27 16:28 authorized_keys -rw------- 1 jack engineer 1675 Jan 27 16:27 id_rsa -rw-r--r-- 1 jack engineer 394 Jan 27 16:27 id_rsa.pub -rw-rw-r-- 1 jack engineer 1205 Jan 27 16:46 known_hosts As a last ditch effort, I copied the authorized_keys, id_rsa, and id_rsa.pub from jill to jack, and changed the username in authorized_keys and id_rsa.pub with vi. It still did not work. It seems there is something different between the two users but I cannot figure out what it is.

Read the article
lsof not showing what port a proc is listening on

- by ericslaw

I have many processes on a box listening on several ports. I am trying to map ports to pids. The problem is that lsof is not telling me what ports belong to which process. Given an apache listening on port 80, I can see it listening via netstat: user@host% netstat -an|grep LISTEN|grep 80 *.80 *.* 0 0 49152 0 LISTEN But when I try to map port 80 to a pid I get nothing: user@host% lsof -iTCP:80 -t When I try seeing what sockets that specific pid is using I get: user@host% lsof -lnP -p31 -a -i COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME libhttpd. 31 0 15u IPv4 0x6002d970b80 0t0 TCP *:65535 (LISTEN) Notice the *:65535 in the NAME column. Does anyone know why lsof is not reporting the port in use? I am running as root. I am using a mix of lsof and os versions: lsof v4.77 on Solaris10 sparc lsof v4.72 on Redhat4.2 etc I know that linux solutions can use "netstat -p", so I guess I'm only looking for why solaris isn't working, but I find lsof is frequently silent and not showing me expected data.

Read the article
Missing whole disk device in OpenSolaris

- by Jeff Mc

I have begun experimenting with Solaris and ZFS as a NAS. All was going very smoothly until I had a drive failure. When I replaced the drive, I no longer have a device file mapped to the whole disk. /dev/dsk/c7t3d0 does not exist but c7t2d0 and c7t4d0 both do. Also the sd@3,0:wd file under the /devices/ tree is non-existent. Do I have to prepare/partition the disk somehow to cause the whole disk device to exist? Here are a few outputs that might be useful. jeffmc@ats-ds2:/dev/dsk$ zpool status pool: datapool state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-2Q scrub: none requested config: NAME STATE READ WRITE CKSUM datapool DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 c7t2d0 ONLINE 0 0 0 c7t3d0 UNAVAIL 0 0 0 cannot open mirror-1 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 c7t5d0 ONLINE 0 0 0 jeffmc@ats-ds2:/dev/dsk$ zpool replace datapool c7t3d0 cannot open 'c7t3d0': no such device in /dev/dsk must be a full path or shorthand device name jeffmc@ats-ds2:/dev/dsk$ sudo format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c7t0d0 /pci@0,0/pci8086,3599@6/pci8086,330@0/pci1014,2cc@7,1/sd@0,0 1. c7t1d0 /pci@0,0/pci8086,3599@6/pci8086,330@0/pci1014,2cc@7,1/sd@1,0 2. c7t2d0 /pci@0,0/pci8086,3599@6/pci8086,330@0/pci1014,2cc@7,1/sd@2,0 3. c7t3d0 /pci@0,0/pci8086,3599@6/pci8086,330@0/pci1014,2cc@7,1/sd@3,0 4. c7t4d0 /pci@0,0/pci8086,3599@6/pci8086,330@0/pci1014,2cc@7,1/sd@4,0 5. c7t5d0 /pci@0,0/pci8086,3599@6/pci8086,330@0/pci1014,2cc@7,1/sd@5,0

Read the article
MySQL getting stuck, eating up disk i/o

- by bonez05

Hi all, Using mySQL 5.0.51 on Solaris . At intermittent times it looks like MySQL is getting 'stuck' . The disk usage on the server spikes to 98% busy from reads. I used dtrace (specifically DTrace toolkit - iosnoop) to track down what processes was using all the reads. Mysql was calling tablename.TDM hundreds of times per second. There was no more than average load on the webserver that could account for this. There were no cronjobs running, and no other utilities like mysqldump or anything. It is a master / slave replication setup. As a jury-rigged fix, I altered the mysql table from 'tablename' to 'tablename2' and then back to 'tablename' This fixed the problem temporarily, and "unsticks" mysql. The disk usage goes back down and dtrace is no longer showing hundreds of reads to 'tablename.TDM' / second. A couple ideas I had are: 1. MySQL version bug 2. Infinite loop somewhere in my application (which i'm not sure how likely this is) 3. ?? Has anybody seen this before or have any insight? Thanks

Read the article
Cannot destroy ZFS snapshot: dataset already exists

- by Morven

I have a server (T5220, though I doubt it matters) running Solaris 10 8/07 and I have a ZFS pool, "mysql", on internal disk. Within it I have a filesystem "mysql/data/4.1.12", which I snapshot hourly with a script from cron. I have one snapshot, created as one of those hourly snaps, that will not destroy. I have renamed it out of sequence to be "mysql/data/4.1.12@wibble" so that my script will not try and fail to destroy it, but it was originally within the sequence, though I doubt that matters. It renames successfully. The snapshot can be successfully navigated and read from through the .zfs/snapshots directory. It has no clones based on it. Trying to destroy it does this: (265) root@web-mysql4:/# zfs destroy mysql/data/4.1.12@wibble cannot destroy 'mysql/data/4.1.12@wibble': dataset already exists (266) root@web-mysql4:/# which is apparently nonsensical: of course it already exists, that's the point! Anyone seen anything like this before? Web searches show nothing obviously similar. I can provide patches installed if necessary.

Read the article
How do I add a second disk to my zfs root pool

- by ankimal

I am trying to add a new disk to my zfs root pool. Here is my current config: zpool status pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 c0d0s0 ONLINE 0 0 0 errors: No known data errors bash-3.00# df -h Filesystem Size Used Avail Use% Mounted on rpool/ROOT/s10x_u7wos_08 311G 18G 293G 6% / swap 14G 384K 14G 1% /etc/svc/volatile /usr/lib/libc/libc_hwcap1.so.1 311G 18G 293G 6% /lib/libc.so.1 swap 14G 52K 14G 1% /tmp swap 14G 40K 14G 1% /var/run rpool/export 293G 19K 293G 1% /export rpool/export/home 430G 138G 293G 32% /export/home rpool 293G 36K 293G 1% /rpool # format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c0d0 <DEFAULT cyl 60797 alt 2 hd 255 sec 63> /pci@0,0/pci-ide@1f,2/ide@0/cmdk@0,0 1. c2d0 <Hitachi- JK1181YAHL0YK-0001-16777216.> /pci@0,0/pci-ide@1f,5/ide@1/cmdk@0,0 Disk 1 above is the new disk I need to attach to expand my root pool (give /export/home some extra space). If I try to attach my new disk to the pool # zpool attach -f rpool c0d0s0 c2d0s0 cannot attach c2d0s0 to c0d0s0: new device must be a single disk # uname -a SunOS dsol1 5.10 Generic_139556-08 i86pc i386 i86pc Solaris Any ideas?

Read the article
zfs pool error, how to determine which drive failed in the past

- by Kendrick

I had been copying data from my pool so that I could rebuild it with a different version so that I could go away from solaris 11 and to one that is portable between freebsd/openindia etc. it was copying at 20mb a sec the other day which is about all my desktop drive can handle writing from the network. suddently lastnight it went down to 1.4mb i ran zpool status today and got this. pool: store state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scan: none requested config: NAME STATE READ WRITE CKSUM store ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 c8t3d0p0 ONLINE 0 0 2 c8t4d0p0 ONLINE 0 0 10 c8t2d0p0 ONLINE 0 0 0 it is currently a 3 x1tb drive array. what tools would best be used to determine what the error was and which drive is failing. per the admin doc The second section of the configuration output displays error statistics. These errors are divided into three categories: READ – I/O errors occurred while issuing a read request. WRITE – I/O errors occurred while issuing a write request. CKSUM – Checksum errors. The device returned corrupted data as the result of a read request. it was saying low counts could be any thing from a power flux to a disk event but gave no suggestions as to what tools to check and determine with.

Read the article
HDFS some datanodes of cluster are suddenly disconnected while reducers are running

- by user1429825

I have 8 slave computers and 1 master computer for running Hadoop (ver 0.21) some datanodes of cluster are suddenly disconnected while I was running MapReduce code on 10GB data After all mappers finished and around 80% of reducers was processed, randomly one or more datanode disconned from network. and then the other datanodes start to disappear from network even if I killed the MapReduce job when I found some datanode was disconnected. I've tried to change dfs.datanode.max.xcievers to 4096, turned off fire-walls of all computing node, disabled selinux and increased the number of file open limit to 20000 but they didn't work at all... anyone have a idea to solve this problem? followings are error log from mapreduce 12/06/01 12:31:29 INFO mapreduce.Job: Task Id : attempt_201206011227_0001_r_000006_0, Status : FAILED java.io.IOException: Bad connect ack with firstBadLink as ***.***.***.148:20010 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:889) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:820) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:427) and followings are logs from datanode 2012-06-01 13:01:01,118 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-5549263231281364844_3453 src: /*.*.*.147:56205 dest: /*.*.*.142:20010 2012-06-01 13:01:01,136 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(*.*.*.142:20010, storageID=DS-1534489105-*.*.*.142-20010-1337757934836, infoPort=20075, ipcPort=20020) Starting thread to transfer block blk_-3849519151985279385_5906 to *.*.*.147:20010 2012-06-01 13:01:19,135 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(*.*.*.142:20010, storageID=DS-1534489105-*.*.*.142-20010-1337757934836, infoPort=20075, ipcPort=20020):Failed to transfer blk_-5797481564121417802_3453 to *.*.*.146:20010 got java.net.ConnectException: > Connection timed out at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:373) at org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:1257) at java.lang.Thread.run(Thread.java:722) 2012-06-01 13:06:20,342 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_6674438989226364081_3453 2012-06-01 13:09:01,781 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(*.*.*.142:20010, storageID=DS-1534489105-*.*.*.142-20010-1337757934836, infoPort=20075, ipcPort=20020):Failed to transfer blk_-3849519151985279385_5906 to *.*.*.147:20010 got java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/*.*.*.142:60057 remote=/*.*.*.147:20010] at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:164) at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:203) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:388) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:476) at org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:1284) at java.lang.Thread.run(Thread.java:722) hdfs-site.xml <configuration> <property> <name>dfs.name.dir</name> <value>/home/hadoop/data/name</value> </property> <property> <name>dfs.data.dir</name> <value>/home/hadoop/data/hdfs1,/home/hadoop/data/hdfs2,/home/hadoop/data/hdfs3,/home/hadoop/data/hdfs4,/home/hadoop/data/hdfs5</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.datanode.max.xcievers</name> <value>4096</value> </property> <property> <name>dfs.http.address</name> <value>0.0.0.0:20070</value> <description>50070 The address and the base port where the dfs namenode web ui will listen on. If the port is 0 then the server will start on a free port. </description> </property> <property> <name>dfs.datanode.http.address</name> <value>0.0.0.0:20075</value> <description>50075 The datanode http server address and port. If the port is 0 then the server will start on a free port. </description> </property> <property> <name>dfs.secondary.http.address</name> <value>0.0.0.0:20090</value> <description>50090 The secondary namenode http server address and port. If the port is 0 then the server will start on a free port. </description> </property> <property> <name>dfs.datanode.address</name> <value>0.0.0.0:20010</value> <description>50010 The address where the datanode server will listen to. If the port is 0 then the server will start on a free port. </description> <property> <name>dfs.datanode.ipc.address</name> <value>0.0.0.0:20020</value> <description>50020 The datanode ipc server address and port. If the port is 0 then the server will start on a free port. </description> </property> <property> <name>dfs.datanode.https.address</name> <value>0.0.0.0:20475</value> </property> <property> <name>dfs.https.address</name> <value>0.0.0.0:20470</value> </property> </configuration> mapred-site.xml <configuration> <property> <name>mapred.job.tracker</name> <value>masternode:29001</value> </property> <property> <name>mapred.system.dir</name> <value>/home/hadoop/data/mapreduce/system</value> </property> <property> <name>mapred.local.dir</name> <value>/home/hadoop/data/mapreduce/local</value> </property> <property> <name>mapred.map.tasks</name> <value>32</value> <description> default number of map tasks per job.</description> </property> <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>4</value> </property> <property> <name>mapred.reduce.tasks</name> <value>8</value> <description> default number of reduce tasks per job.</description> </property> <property> <name>mapred.map.child.java.opts</name> <value>-Xmx2048M</value> </property> <property> <name>io.sort.mb</name> <value>500</value> </property> <property> <name>mapred.task.timeout</name> <value>1800000</value>  </property> <property> <name>mapred.job.tracker.http.address</name> <value>0.0.0.0:20030</value> <description> 50030 The job tracker http server address and port the server will listen on. If the port is 0 then the server will start on a free port. </description> </property> <property> <name>mapred.task.tracker.http.address</name> <value>0.0.0.0:20060</value> <description> 50060 </property> </configuration>

Read the article
Oracle va proposer ses serveurs Sparc avec Oracle Enterprise Linux et plus simplement avec Solaris pour concurrencer encore plus IBM

Oracle va proposer ses serveurs Sparc avec Oracle Enterprise Linux Et plus simplement avec Solaris, pour concurrencer encore plus IBM Oracle va porter sa distribution dans les prochaines versions de son processeur Sparc. Jusqu'ici, Solaris était l'OS de prédilection pour les serveurs SPARC. Ceci pourrait changer. Oracle a en effet décidé de mettre en avant sa distribution Linux : Oracle Enterprise Linux « Nous pensons que le Sparc va devenir clairement la meilleure technologie pour faire tourner des solutions Oracle », a déclaré Larry Ellison, le PDG d'Oracle lors du lancement des nouveaux systèmes SPARC. « Nous serions idiots de ne pas y porter Oracle Enterprise...

Read the article
Result of the "How long do you wait before Solaris 11 gets on your prod systems?"

- by nospam(at)example.com (Joerg Moellenkamp)

I just removed the poll at 10:52, so this is the final result: My conclusions out of it: While the removal of UltraSPARC I to VI+ support in Solaris 11 may hit some of the people voting in the categories "Wait?" to "6 month", most of the users keep Solaris 10 running on their existing system anyway or migrate that late that even the newest system have reached their end-of-service-live or are near of it, so a migration doesn't sound that feasible. So i assume Product Management was right with their decision to remove the support in order to make the feature i can't talk of possible, as i don't think that many of the early migrators are still using the system in question, as most systems have reached EOSL. Didn't thought that there would be people waiting three years and more ...

Read the article
Sending USR2 to mongrel_rails sometimes results in an “Address already in use” on the restart

- by Ben

We have a rolling-restart mode for our mongrel cluster that sends a USR2 signal to each running process. This works great, most of the time. But very occasionally the mongrel process will shutdown, and then fail to restart, with the following error: /usr/local/lib/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/tcphack.rb:12:in `initialize_without_backlog': Address already in use - bind(2) (Errno::EADDRINUSE) from /usr/local/lib/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/tcphack.rb:12:in `initialize' from /usr/local/lib/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:93:in `new' from /usr/local/lib/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:93:in `initialize' from /usr/local/lib/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/configurator.rb:139:in `new' from /usr/local/lib/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/configurator.rb:139:in `listener' Looking though the mongrel source, the USR2 handler calls a synchronous stop on the running server, so it ought to block until the socket has been released. Has anyone seen this error? Does anyone have any ideas what might cause it? (I asked this question over on StackOverflow initially, but thought it might be more appropriate here)

Read the article
PBS batch jobs - the qalter command

- by Ryan Budney

I've got a giant computation running on a Scientific Linux cluster. At present I have over 600 jobs parked in the queue, waiting for processor time, while a few are running. I'm trying to use the qalter command on some of the idle but scheduled jobs. I'd like to schedule them for a later time, so that other users can jump part of the queue, sort of as an act of politeness. Is this doable? For example, JOBNAME 292399 is currently idle, scheduled to be run whenever a spot in the queue opens up. But if I run qalter -a 10051000 292398 followed by qrerun 292398 I get qrerun: Request invalid for state of job 292398.euler. From the qalter documentation, I thought 10051000 refers to tomorrow (oct 5th, 10am) but perhaps I'm misunderstanding something? If I'm going about this the wrong way, please let me know. The main thing I'm looking for is a command that's easily scriptable, so that I can modify when my queued tasks get run. qalter seems good for those purposes if I can get it working. I'd rather avoid running qdel and re qsubbing the computations, as there's a bookkeeping issue on which tasks to restart (vs which ones not to). I want to avoid that kind of bookkeeping. From googling around I notice some qalter commands have rather different date formats, but the above appears to be correct, as far as I can tell from the man docs. Any help would be appreciated.

Read the article
Pitfalls to using Gluster as a home/profile directory server?

- by Bart Silverstrim

I was asking recently about options for divvying up access to file servers, as we have a NAS solution that gets fairly bogged down when our users (with giant profiles, especially) all log in nearly simultaneously. I ran across Gluster and it looks like it can cluster different physical storage media into a single virtual volume and share it out like a virtual NAS from the client perspective and it support CIFS. My question is whether something like this would be feasible to use for home and profile directories in an active directory environment. I was worried about ACL's, primarily, as I didn't think CIFS was fine-grained enough to support NTFS permissions and it didn't look like Gluster exports those permission levels, just the base permissions for basic file sharing. I got the impression that using Gluster would allow for data to be redundant across multiple servers and would speed up access to the files under heavy load, while allowing us to dynamically boost storage capacity by just adding another server and telling Gluster's master node to add that server. Maybe I'm wrong with my understanding of it though. Anyone else use it or care to share how feasible this is?

Read the article
Built local glibc, broke system, how do I ssh without parsing the .bashrc?

- by Mikhail

The cluster I am on had really old build tools and I needed to use CUDA5. I'm a pretty clever dude and I planned on building the necissary tools. So, I built a local copy of gcc, bintools, and glibc. Everything a CUDA5 could want. All builds finished without error. and I tested gcc and bintools. Everything was wonderful and I built and ran a few of the programs. I set up the LD_LIBRARY_PATHs in the .bashrc and logged back in, expecting a productive night ahead. To my horror I realized that everything is dynamically linked. Now I can't do simple commands like ls [ex@uid377 ~]$ ls ls: error while loading shared libraries: __vdso_time: invalid mode for dlopen(): Invalid argument and I can't do commands to fix the problem like rm or vim! Is there a way for me to ssh but also to ignore .bashrc file? Any suggestions are much appreciated. This machine is obviously under maintained and I don't know when I could have administrator support.

Read the article
Cannot increase Datastore

- by k4w4zz

Hello, We have an ESX 4.0 cluster with 2 hosts, EMC Clarion SAN storage with 10 LUNs. We have added 2 new 400 GB LUNs. All the LUNs are visible from both hosts. I have extended an existing 500 GB datastore with one of these 400 GB LUNs - the new datastore size is now 900 GB. I'd like to do the same operation with the second 400 GB LUN to extend another existing datastore but I'm not able to do it. The LUN is available to create a brand new datastore but is not visible to extend an existing one. I don't understand why everything was fine with the other one and why can't I do the same exact operation with this LUN. The result is the same on both hosts. The SAN admin have erased and re-created several times this LUN. I have rescan the HBA each time. In attachment you can find the result of the esxcfg-mpath -l and fdisk -l commands on both servers. Does somebody have an idea please?

Read the article
Managing an application across multiple servers, or PXE vs cfEngine/Chef/Puppet

- by matt

We have an application that is running on a few (5 or so and will grow) boxes. The hardware is identical in all the machines, and ideally the software would be as well. I have been managing them by hand up until now, and don't want to anymore (static ip addresses, disabling all necessary services, installing required packages...) . Can anyone balance the pros and cons of the following options, or suggest something more intelligent? 1: Individually install centos on all the boxes and manage the configs with chef/cfengine/puppet. This would be good, as I have wanted an excuse to learn to use one of applications, but I don't know if this is actually the best solution. 2: Make one box perfect and image it. Serve the image over PXE and whenever I want to make modifications, I can just reboot the boxes from a new image. How do cluster guys normally handle things like having mac addresses in the /etc/sysconfig/network-scripts/ifcfg* files? We use infiniband as well, and it also refuses to start if the hwaddr is wrong. Can these be correctly generated at boot? I'm leaning towards the PXE solution, but I think monitoring with munin or nagios will be a little more complicated with this. Anyone have experience with this type of problem? All the servers have SSDs in them and are fast and powerful. Thanks, matt.

Read the article
Determining physical location of data on a disc

- by Synetech

Does anybody know of a way to find out where, physically on a CD or DVD a given piece of data would be located? I am trying to watch a DVD at the moment, and am about half-way through, but it keeps dying at a specific spot in the film, presumably because of a scratch. I have a repair kit, but I don’t know where to focus my repair because there are several scuffs and scratches on the disc and I have no way of knowing which one is causing the issue. Obviously, cleaning all of them is inadvisable because not only does it waste the consumable materials in the kit, but not all of them are a problem, and by working them, some may become unreadable. Moreover, just because I am half-way through the movie does not mean that it would be half-way from the hub to the edge for several reasons: Discs have more data towards the outer edge than the inner edge (circles are more mathematically complicated than rectangles) The disc is not completely filled up (and even if it were, the movie itself would be be using it all, there are extras and such) Because in this particular case it is a commercial DVD, it is also dual-layer which further complicates manual determination As such, I am trying to find a program that can let me identify a file (or part thereof), cluster, etc. and show me a picture of where on the CD/DVD it would be located. That way, I can look at the disc and fix any scratches that correspond to that distance from the hub. For example, the image below might indicate where on a disc a couple of files or range of clusters would be located, so by looking for anomalies in those areas (rotating as necessary), the correct one can be identified. I’m sure it can be done since at least one form of copy protection (DPM) uses it and DVD-lab Pro includes a “DVD Topology” feature to do this.

Read the article
thought about shared storage (NFS, Lustre) [closed]

- by user134880

Possible Duplicate: Can you help me with my capacity planning? Now I habe small cluster with total of 8 nodes. 6 of them are computing nodes (apache and vmware) and 2 nodes are for storage. 2 storage nodes are identical. Each storage server is linux box with 8 x 1Tb WD RE4 in soft raid 10. 1st box is master and 2nd is slave. Data is mirrored with DRDB. We export NFSv4 shares to Apache (for document root) and iSCSI to Vmware. Now all is working pretty good and stable. But it will be soon time to upgrade our system. I have been thinking of Lustre. Does some one has any real experience with Lustre or NFS medium clusters? Will it be good idea just to upgrade server and change hdd's to 3Tb ? With NFS we will always have only 2 servers to maintain (one primary and one slave). Thanks. QUESTIONS: 1) Does some one used Lustre? In production? I have seen a lot of info about how it is hard to setup Lustre because you need to compile own kernel and patches. It's answers from newbies. Is there some one who has used Lustre for some period of time? 2) About disk upgrades - it's only description of strategy. I'm not asking if it is enough 3Tb or not. I just ask if it is right just to replace hdds instead of adding new server (like with Lustre) Thanks again.

Read the article
What applications can be used in a Red Hat/CentOS cluster?

- by Sandra

Hi, When I look at the Red Hat cluster manuals 1 2, they only explain how to install it but not what applications can use it. I am new to clusters, so I don't know these things =) Let's say I want to 3 node high performance cluster; What applications would work with it? Also, how does an application talk to the cluster? Does the application need to have been written to support clusters? Sandra

Read the article
Setting up an Active-Active IIS Cluster with ARR - is it possible?

- by Ahmed Zubair

I would like to know if we can setup an Active-Active IIS Cluster using Windows Cluster services that shares a common storage to store web content and WITHOUT the use of Windows NLB. I'm aware that this may not be a best practice or not a recommended setup, however, the setup is to be configured as below: Two web servers running IIS 7.5 (needs a common storage for web content) for HA and another set of two servers for sql cluster in active-passive mode for HA. Also is it possible to enable ARR on 2 node active-active IIS cluster for load balancing http requests? Appreciate if someone replies with both pros & cons of the setup.

Read the article
How can I format an SD card with a more robust Linux-usable filesystem with a specific cluster size for better write performace?

- by Harvey

Goal: microSD card formatted... for best write performance for use only with embedded Linux for better reliability (random power failures may occur) using an 64kB cluster size I'm using an 8GB microSD card for data storage inside an embedded Linux/ARM device. The SD card is not removable. I've been using ext3 instead of the pre-installed FAT32 because it seems to better handle random power failures during writes. However, I kept noticing that my write performance is always best with the pre-installed FAT32 from Kingston. If I reformat the card with FAT32, the performance still suffers. After browsing wikipedia, I stumbled upon the following comment saying that some cards are optimized for specific cluster sizes. In my case, the Kingston comes pre-formatted for an 64kB cluster size. Risks of reformatting Reformatting an SD card with a different file system, or even with the same one, may make the card slower, or shorten its lifespan. Some cards use wear leveling, in which frequently modified blocks are mapped to different portions of memory at different times, and some wear-leveling algorithms are designed for the access patterns typical of the file allocation table on a FAT16 or FAT32 device.[60] In addition, the preformatted file system may use a cluster size that matches the erase region of the physical memory on the card; reformatting may change the cluster size and make writes less efficient.

Read the article
Need to reformat SQL Cluster Disk. How do I recover my SQL installation?

- by I.T. Support

We need to reformat the SQL cluster disk in our SQL cluster. The drive contains the shared installation files for SQL as well as databases. My concern is how SQL/The Cluster will react to after we wipe the disk resource. Questions: Is there a defined procedure for this? How should we backup and restore the disk? After the reformat, how do we get the clustered SQL server back online? Thanks

Read the article

< Previous Page | 35 36 37 38 39 40 41 42 43 44 45 46 | Next Page >