Search Results

Search found 5942 results on 238 pages for 'total starnger'.

Page 233/238 | < Previous Page | 229 230 231 232 233 234 235 236 237 238 | Next Page >

SQL: Using a CASE Statement to update 1000 rows at once

- by SoLoGHoST

Ok, I would like to use a CASE STATEMENT for this, but I am lost with this. Basically, I need to update a ton of rows, but just on the "position" column. I need to update all "position" values from 0 - count(position) for each id_layout_position column per id_layout column. OK, here is a pic of what the table looks like: Now let's say I delete the circled row, this will remove position = 2 and give me: 0, 1, 3, 5, 6, 7, and 4. But I want to add something at the end now and make sure that it has the last possible position, but the positions are already messed up, so I need to reorder them like so before I insert the new row: 0, 1, 2, 3, 4, 5, 6. But it must be ordered by lowest first. So 0 stays at 0, 1 stays at 1, 3 gets changed to 2, the 4 at the end gets changed to a 3, 5 gets changed to 4, 6 gets changed to 5, and 7 gets changed to 6. Hopefully you guys get the picture now. I'm completely lost here. Also, note, this table is tiny compared to how fast it can grow in size, so it needs to be able to do this FAST, thus I was thinking on the CASE STATEMENT for an UPDATE QUERY. Here's what I got for a regular update, but I don't wanna throw this into a foreach loop, as it would take forever to do it. I'm using SMF (Simple Machines Forums), so it might look a little different, but the idea is the same, and CASE statements are supported... $smcFunc['db_query']('', ' UPDATE {db_prefix}dp_positions SET position = {int:position} WHERE id_layout_position = {int:id_layout_position} AND id_layout = {int:id_layout}', array( 'position' => $position++, 'id_layout_position' => (int) $id_layout_position, 'id_layout' => (int) $id_layout, ) ); Anyways, I need to apply some sort of CASE on this so that I can auto-increment by 1 all values that it finds and update to the next possible value. I know I'm doing this wrong, even in this QUERY. But I'm totally lost when it comes to CASES. Here's an example of a CASE being used within SMF, so you can see this and hopefully relate: $conditions = ''; foreach ($postgroups as $id => $min_posts) { $conditions .= ' WHEN posts >= ' . $min_posts . (!empty($lastMin) ? ' AND posts <= ' . $lastMin : '') . ' THEN ' . $id; $lastMin = $min_posts; } // A big fat CASE WHEN... END is faster than a zillion UPDATE's ;). $smcFunc['db_query']('', ' UPDATE {db_prefix}members SET id_post_group = CASE ' . $conditions . ' ELSE 0 END' . ($parameter1 != null ? ' WHERE ' . (is_array($parameter1) ? 'id_member IN ({array_int:members})' : 'id_member = {int:members}') : ''), array( 'members' => $parameter1, ) ); Before I do the update, I actually have a SELECT which throws everything I need into arrays like so: $disabled_sections = array(); $positions = array(); while ($row = $smcFunc['db_fetch_assoc']($request)) { if (!isset($disabled_sections[$row['id_group']][$row['id_layout']])) $disabled_sections[$row['id_group']][$row['id_layout']] = array( 'info' => $module_info[$name], 'id_layout_position' => $row['id_layout_position'] ); // Increment the positions... if (!is_null($row['position'])) { if (!isset($positions[$row['id_layout']][$row['id_layout_position']])) $positions[$row['id_layout']][$row['id_layout_position']] = 1; else $positions[$row['id_layout']][$row['id_layout_position']]++; } else $positions[$row['id_layout']][$row['id_layout_position']] = 0; } Thanks, I know if anyone can help me here it's definitely you guys and gals... Anyways, here is my question: How do I use a CASE statement in the first code example, so that I can update all of the rows in the position column from 0 - total # of rows found, that have that id_layout value and that id_layout_position value, and continue this for all different id_layout values in that table? Can I use the arrays above somehow? I'm sure I'll have to use the id_layout and id_layout_position values for this right? But how can I do this? Ok, guy, I get an error, saying "Hacking Attempt" with the following code: // Updating all positions in here. $smcFunc['db_query']('', ' SET @pos = 0; UPDATE {db_prefix}dp_positions SET position=@pos:=@pos+1 ORDER BY id_layout_position, position', array( ) ); Am I doing something wrong? Perhaps SMF has safeguards against this approach?? Perhaps I need to use a CASE STATEMENT instead?

Read the article
list images from directory by function

- by osc2nuke

i'm using this function: function getmyimages($qid){ $imgdir = 'modules/Projects/uploaded_project_images/'. $qid .''; // the directory, where your images are stored $allowed_types = array('png','jpg','jpeg','gif'); // list of filetypes you want to show $dimg = opendir($imgdir); while($imgfile = readdir($dimg)) { if(in_array(strtolower(substr($imgfile,-3)),$allowed_types)) { $a_img[] = $imgfile; sort($a_img); reset ($a_img); } } $totimg = count($a_img); // total image number for($x=0; $x < $totimg; $x++) { $size = getimagesize($imgdir.'/'.$a_img[$x]); // do whatever $halfwidth = ceil($size[0]/2); $halfheight = ceil($size[1]/2); $mytest = 'name: '.$a_img[$x].' width: '.$size[0].' height: '.$size[1].' <a href="'. $imgdir .'/'.$a_img[$x].'">'. $a_img[$x]. '</a>'; } return $mytest; } And i call this function between a while row as: $sql_select = $db->sql_query('SELECT * from '.$prefix.'_projects WHERE topic=\''.$cid.'\''); OpenTable(); while ($row2 = $db->sql_fetchrow($sql_select)){ $qid = $row2['qid']; $project_query = $db->sql_query('SELECT p.uid, p.uname, p.subject, p.story, p.storyext, p.date, p.topic, p.pdate, p.materials, p.bidoptions, p.projectduration, pd.id_duration, pm.material_id, pbo.bidid, pc.cid FROM ' . $prefix . '_projects p, ' . $prefix . '_projects_duration pd, ' . $prefix . '_project_materials pm, ' . $prefix . '_project_bid_options pbo, ' . $prefix . '_project_categories pc WHERE p.topic=\''.$cid.'\' and p.qid=\''.$qid.'\' and p.bidoptions=pbo.bidid and p.materials=pm.material_id and p.projectduration=pd.id_duration'); while ($project_row = $db->sql_fetchrow($project_query)) { //$qid = $project_row['qid']; $uid = $project_row['uid']; $uname = $project_row['uname']; $subject = $project_row['subject']; $story = $project_row['story']; $storyext = $project_row['storyext']; $date = $project_row['date']; $topic = $project_row['topic']; $pdate = $project_row['pdate']; $materials = $project_row['materials']; $bidoptions = $project_row['bidoptions']; $projectduration = $project_row['projectduration']; //Get the topic name $topic_query = $db->sql_query('SELECT cid,title from '.$prefix.'_project_categories WHERE cid =\''.$cid.'\''); while ($topic_row = $db->sql_fetchrow($topic_query)) { $topic_id = $topic_row['cid']; $topic_title = $topic_row['title']; } //Get the material text $material_query = $db->sql_query('SELECT material_id,material_name from '.$prefix.'_project_materials WHERE material_id =\''.$materials.'\''); while ($material_row = $db->sql_fetchrow($material_query)) { $material_id = $material_row['material_id']; $material_name = $material_row['material_name']; } //Get the bid methode $bid_query = $db->sql_query('SELECT bidid,bidname from '.$prefix.'_project_bid_options WHERE bidid =\''.$bidoptions.'\''); while ($bid_row = $db->sql_fetchrow($bid_query)) { $bidid = $bid_row['bidid']; $bidname = $bid_row['bidname']; } //Get the project duration $duration_query = $db->sql_query('SELECT id_duration,duration_value,duration_alias from '.$prefix.'_projects_duration WHERE id_duration =\''.$projectduration.'\''); while ($duration_row = $db->sql_fetchrow($duration_query)) { $id_duration = $duration_row['id_duration']; $duration_value = $duration_row['duration_value']; $duration_alias = $duration_row['duration_alias']; } } echo ' id--->' .$qid. ' uid--->' .$uid. ' username--->' .$uname. ' subject--->'.$subject. ' story1--->'.$story. ' story2--->'.$storyext. ' postdate--->'.$date. ' categorie--->'.$topic_title . ' project start--->'.$pdate. ' materials--->'.$material_name. ' bid methode--->'.$bidname. ' project duration--->'.$duration_alias.' image url--->'.getmyimages($qid).' '; } CloseTable(); the result outputs only the "last" file from the directories. if i do a echo instead of a return $mytest; it read the whole directory but ruïns the output.

Read the article
Perl - Reading .txt files line-by-line and using compare function (printing non-matches only once)

- by Kurt W

I am really struggling and have spent about two full days on this banging my head against receiving the same result every time I run this perl script. I have a Perl script that connects to a vendor tool and stores data for ~26 different elements within @data. There is a foreach loop for @data that breaks the 26 elements into $e-{'element1'), $e-{'element2'), $e-{'element3'), $e-{'element4'), etc. etc. etc. I am also reading from the .txt files within a directory (line-by-line) and comparing the server names that exist within the text files with what exists in $e-{'element4'}. The Problem: Matches are working perfectly and only printing one line for each of the 26 elements when there is a match, however non-matches are producing one line for every entry within the .txt files (37 in all). So if there are 100 entries (each entry having 26 elements) stored within @data, then there are 100 x 37 entries being printed. So for every non-match in the: if ($e-{'element4'} eq '6' && $_ =~ /$e-{element7}/i) statement below, I am receiving a print out saying that there is not a match. 37 entries for the same identical 26 elements (because there are 37 total entries in all of the .txt files). The Goal: I need to print out only 1 line for each unique entry (a unique entry being $e-{element1} thru $e-{element26}). It is already printing one 1 line for matches, but it is printing out 37 entries when there is not a match. I need to treat matches and non-matches differently. Code: foreach my $e (@data) { # Open the .txt files stored within $basePath and use for comparison: opendir(DIRC, $basePath . "/") || die ("cannot open directory"); my @files=(readdir(DIRC)); my @MPG_assets = grep(/(.*?).txt/, @files); # Loop through each system name found and compare it with the data in SC for a match: foreach(@MPG_assets) { $filename = $_; open (MPGFILES, $basePath . "/" . $filename) || die "canot open the file"; while(<MPGFILES>) { if ($e->{'element4'} eq '6' && $_ =~ /$e->{'element7'}/i) { ## THIS SECTION WORKS PERFECTLY AND ONLY PRINTS MATCHES WHERE $_ ## (which contains the servernames (1 per line) in the .txt files) ## EQUALS $e->{'element7'}. print $e->{'element1'} . "\n"; print $e->{'element2'} . "\n"; print $e->{'element3'} . "\n"; print $e->{'element4'} . "\n"; print $e->{'element5'} . "\n"; # ... print $e->{'element26'} . "\n"; } else { ## **THIS SECTION DOES NOT WORK**. FOR EVERY NON-MATCH, THERE IS A ## LINE PRINTED WITH 26 IDENTICAL ELEMENTS BECAUSE ITS LOOPING THRU ## THE 37 LINES IN THE *.TXT FILES. print $e->{'element1'} . "\n"; print $e->{'element2'} . "\n"; print $e->{'element3'} . "\n"; print $e->{'element4'} . "\n"; print $e->{'element5'} . "\n"; # ... print $e->{'element26'} . "\n"; } # End of 'if ($e->{'element4'} eq..' statement } # End of while loop } # End of 'foreach(@MPG_assets)' } # End of 'foreach my $e (@data)' I think I need something to identical unique elements and define what fields make up a unique element but honestly I have tried everything I know. If you would be so kind to provide actual code fixes, that would be wonderful because I am headed to production with this script quite soon. Also. I am looking for code (ideally) that is very human-readable because I will need to document it so others can understand. Please let me know if you need additional information.

Read the article
ActiveMQ - "Cannot send, channel has already failed" every 2 seconds?

- by quanta

ActiveMQ 5.7.0 In the activemq.log, I'm seeing this exception every 2 seconds: 2013-11-05 13:00:52,374 | DEBUG | Transport Connection to: tcp://127.0.0.1:37501 failed: org.apache.activemq.transport.InactivityIOException: Cannot send, channel has already failed: tcp://127.0.0.1:37501 | org.apache.activemq.broker.TransportConnection.Transport | Async Exception Handler org.apache.activemq.transport.InactivityIOException: Cannot send, channel has already failed: tcp://127.0.0.1:37501 at org.apache.activemq.transport.AbstractInactivityMonitor.doOnewaySend(AbstractInactivityMonitor.java:282) at org.apache.activemq.transport.AbstractInactivityMonitor.oneway(AbstractInactivityMonitor.java:271) at org.apache.activemq.transport.TransportFilter.oneway(TransportFilter.java:85) at org.apache.activemq.transport.WireFormatNegotiator.oneway(WireFormatNegotiator.java:104) at org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:68) at org.apache.activemq.broker.TransportConnection.dispatch(TransportConnection.java:1312) at org.apache.activemq.broker.TransportConnection.processDispatch(TransportConnection.java:838) at org.apache.activemq.broker.TransportConnection.iterate(TransportConnection.java:873) at org.apache.activemq.thread.PooledTaskRunner.runTask(PooledTaskRunner.java:129) at org.apache.activemq.thread.PooledTaskRunner$1.run(PooledTaskRunner.java:47) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Due to this keyword InactivityIOException, the first thing comes to my mind is InactivityMonitor, but the strange thing is MaxInactivityDuration=30000: 2013-11-05 13:11:02,672 | DEBUG | Sending: WireFormatInfo { version=9, properties={MaxFrameSize=9223372036854775807, CacheSize=1024, CacheEnabled=true, SizePrefixDisabled=false, MaxInactivityDurationInitalDelay=10000, TcpNoDelayEnabled=true, MaxInactivityDuration=30000, TightEncodingEnabled=true, StackTraceEnabled=true}, magic=[A,c,t,i,v,e,M,Q]} | org.apache.activemq.transport.WireFormatNegotiator | ActiveMQ BrokerService[localhost] Task-2 Moreover, I also didn't see something like this: No message received since last read check for ... or: Channel was inactive for too (30000) long Do a netstat, I see these connections in TIME_WAIT state: tcp 0 0 127.0.0.1:38545 127.0.0.1:61616 TIME_WAIT - tcp 0 0 127.0.0.1:38544 127.0.0.1:61616 TIME_WAIT - tcp 0 0 127.0.0.1:38522 127.0.0.1:61616 TIME_WAIT - Here're the output when running tcpdump: Internet Protocol Version 4, Src: 127.0.0.1 (127.0.0.1), Dst: 127.0.0.1 (127.0.0.1) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00: Not-ECT (Not ECN-Capable Transport)) 0000 00.. = Differentiated Services Codepoint: Default (0x00) .... ..00 = Explicit Congestion Notification: Not-ECT (Not ECN-Capable Transport) (0x00) Total Length: 296 Identification: 0x7b6a (31594) Flags: 0x02 (Don't Fragment) 0... .... = Reserved bit: Not set .1.. .... = Don't fragment: Set ..0. .... = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: TCP (6) Header checksum: 0xc063 [correct] [Good: True] [Bad: False] Source: 127.0.0.1 (127.0.0.1) Destination: 127.0.0.1 (127.0.0.1) Transmission Control Protocol, Src Port: 61616 (61616), Dst Port: 54669 (54669), Seq: 1, Ack: 2, Len: 244 Source port: 61616 (61616) Destination port: 54669 (54669) [Stream index: 11] Sequence number: 1 (relative sequence number) [Next sequence number: 245 (relative sequence number)] Acknowledgement number: 2 (relative ack number) Header length: 32 bytes Flags: 0x018 (PSH, ACK) 000. .... .... = Reserved: Not set ...0 .... .... = Nonce: Not set .... 0... .... = Congestion Window Reduced (CWR): Not set .... .0.. .... = ECN-Echo: Not set .... ..0. .... = Urgent: Not set .... ...1 .... = Acknowledgement: Set .... .... 1... = Push: Set .... .... .0.. = Reset: Not set .... .... ..0. = Syn: Not set .... .... ...0 = Fin: Not set Window size value: 256 [Calculated window size: 32768] [Window size scaling factor: 128] Checksum: 0xff1c [validation disabled] [Good Checksum: False] [Bad Checksum: False] Options: (12 bytes) No-Operation (NOP) No-Operation (NOP) Timestamps: TSval 2304161892, TSecr 2304161891 Kind: Timestamp (8) Length: 10 Timestamp value: 2304161892 Timestamp echo reply: 2304161891 [SEQ/ACK analysis] [Bytes in flight: 244] Constrained Application Protocol, TID: 240, Length: 244 00.. .... = Version: 0 ..00 .... = Type: Confirmable (0) .... 0000 = Option Count: 0 Code: Unknown (0) Transaction ID: 240 Payload Content-Type: text/plain (default), Length: 240, offset: 4 Line-based text data: text/plain [truncated] \001ActiveMQ\000\000\000\t\001\000\000\000<DE>\000\000\000\t\000\fMaxFrameSize\006\177<FF><FF><FF><FF> <FF><FF><FF>\000\tCacheSize\005\000\000\004\000\000\fCacheEnabled\001\001\000\022SizePrefixDisabled\001\000\000 MaxInactivityDurationInitalDelay\006\ It is very likely a tcp port check. This is what I see when trying telnet from another host: 2013-11-05 16:12:41,071 | DEBUG | Transport Connection to: tcp://10.8.20.9:46775 failed: java.io.EOFException | org.apache.activemq.broker.TransportConnection.Transport | ActiveMQ Transport: tcp:///10.8.20.9:46775@61616 java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.activemq.openwire.OpenWireFormat.unmarshal(OpenWireFormat.java:275) at org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:229) at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:221) at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:204) at java.lang.Thread.run(Thread.java:662) 2013-11-05 16:12:41,071 | DEBUG | Transport Connection to: tcp://10.8.20.9:46775 failed: org.apache.activemq.transport.InactivityIOException: Cannot send, channel has already failed: tcp://10.8.20.9:46775 | org.apache.activemq.broker.TransportConnection.Transport | Async Exception Handler org.apache.activemq.transport.InactivityIOException: Cannot send, channel has already failed: tcp://10.8.20.9:46775 at org.apache.activemq.transport.AbstractInactivityMonitor.doOnewaySend(AbstractInactivityMonitor.java:282) at org.apache.activemq.transport.AbstractInactivityMonitor.oneway(AbstractInactivityMonitor.java:271) at org.apache.activemq.transport.TransportFilter.oneway(TransportFilter.java:85) at org.apache.activemq.transport.WireFormatNegotiator.oneway(WireFormatNegotiator.java:104) at org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:68) at org.apache.activemq.broker.TransportConnection.dispatch(TransportConnection.java:1312) at org.apache.activemq.broker.TransportConnection.processDispatch(TransportConnection.java:838) at org.apache.activemq.broker.TransportConnection.iterate(TransportConnection.java:873) at org.apache.activemq.thread.PooledTaskRunner.runTask(PooledTaskRunner.java:129) at org.apache.activemq.thread.PooledTaskRunner$1.run(PooledTaskRunner.java:47) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 2013-11-05 16:12:41,071 | DEBUG | Unregistering MBean org.apache.activemq:BrokerName=localhost,Type=Connection,ConnectorName=ope nwire,ViewType=address,Name=tcp_//10.8.20.9_46775 | org.apache.activemq.broker.jmx.ManagementContext | ActiveMQ Transport: tcp:/ //10.8.20.9:46775@61616 2013-11-05 16:12:41,073 | DEBUG | Stopping connection: tcp://10.8.20.9:46775 | org.apache.activemq.broker.TransportConnection | ActiveMQ BrokerService[localhost] Task-5 2013-11-05 16:12:41,073 | DEBUG | Stopping transport tcp:///10.8.20.9:46775@61616 | org.apache.activemq.transport.tcp.TcpTranspo rt | ActiveMQ BrokerService[localhost] Task-5 2013-11-05 16:12:41,073 | DEBUG | Initialized TaskRunnerFactory[ActiveMQ Task] using ExecutorService: java.util.concurrent.Threa dPoolExecutor@23cc2a28 | org.apache.activemq.thread.TaskRunnerFactory | ActiveMQ BrokerService[localhost] Task-5 2013-11-05 16:12:41,074 | DEBUG | Closed socket Socket[addr=/10.8.20.9,port=46775,localport=61616] | org.apache.activemq.transpo rt.tcp.TcpTransport | ActiveMQ Task-1 2013-11-05 16:12:41,074 | DEBUG | Forcing shutdown of ExecutorService: java.util.concurrent.ThreadPoolExecutor@23cc2a28 | org.apache.activemq.util.ThreadPoolUtils | ActiveMQ BrokerService[localhost] Task-5 2013-11-05 16:12:41,074 | DEBUG | Stopped transport: tcp://10.8.20.9:46775 | org.apache.activemq.broker.TransportConnection | ActiveMQ BrokerService[localhost] Task-5 2013-11-05 16:12:41,074 | DEBUG | Connection Stopped: tcp://10.8.20.9:46775 | org.apache.activemq.broker.TransportConnection | ActiveMQ BrokerService[localhost] Task-5 2013-11-05 16:12:41,902 | DEBUG | Sending: WireFormatInfo { version=9, properties={MaxFrameSize=9223372036854775807, CacheSize=1024, CacheEnabled=true, SizePrefixDisabled=false, MaxInactivityDurationInitalDelay=10000, TcpNoDelayEnabled=true, MaxInactivityDuration=30000, TightEncodingEnabled=true, StackTraceEnabled=true}, magic=[A,c,t,i,v,e,M,Q]} | org.apache.activemq.transport.WireFormatNegotiator | ActiveMQ BrokerService[localhost] Task-5 So the question is: how can I find out the process that is trying to connect to my ActiveMQ (from localhost) every 2 seconds?

Read the article
mysql: Bind on unix socket: Permission denied

- by Alex

Can't start mysql with: sudo /usr/bin/mysqld_safe --datadir=/srv/mysql/myDB --log-error=/srv/mysql/logs/mysqld-myDB.log --pid-file=/srv/mysql/pids/mysqld-myDB.pid --user=mysql --socket=/srv/mysql/sockets/mysql-myDB.sock --port=3700 120222 13:40:48 mysqld_safe Starting mysqld daemon with databases from /srv/mysql/myDB 120222 13:40:54 mysqld_safe mysqld from pid file /srv/mysql/pids/mysqld-myDB.pid ended /srv/mysql/logs/mysqld-myDB.log: 120222 13:43:53 mysqld_safe Starting mysqld daemon with databases from /srv/mysql/myDB 120222 13:43:53 [Note] Plugin 'FEDERATED' is disabled. /usr/sbin/mysqld: Table 'plugin' is read only 120222 13:43:53 [ERROR] Can't open the mysql.plugin table. Please run mysql_upgrade to create it. 120222 13:43:53 InnoDB: Completed initialization of buffer pool 120222 13:43:53 InnoDB: Started; log sequence number 32 4232720908 120222 13:43:53 [ERROR] Can't start server : Bind on unix socket: Permission denied 120222 13:43:53 [ERROR] Do you already have another mysqld server running on socket: /srv/mysql/sockets/mysql-myDB.sock ? 120222 13:43:53 [ERROR] Aborting 120222 13:43:53 InnoDB: Starting shutdown... One instance mysqld is running: $ ps aux | grep mysql mysql 1093 0.0 0.2 169972 18700 ? Ssl 11:50 0:02 /usr/sbin/mysqld $ Port 3700 is available: $ netstat -a | grep 3700 $ Directory with sockets is empty: $ ls /srv/mysql/sockets/ $ There are all permissions: $ ls -l /srv/mysql/ total 20 drwxrwxrwx 2 mysql mysql 4096 2012-02-22 13:28 logs drwxrwxrwx 13 mysql mysql 4096 2012-02-22 13:44 myDB drwxrwxrwx 2 mysql mysql 4096 2012-02-22 12:55 pids drwxrwxrwx 2 mysql mysql 4096 2012-02-22 12:55 sockets drwxrwxrwx 2 mysql mysql 4096 2012-02-22 13:25 version Apparmor config: $cat /etc/apparmor.d/usr.sbin.mysqld # vim:syntax=apparmor # Last Modified: Tue Jun 19 17:37:30 2007 #include <tunables/global> /usr/sbin/mysqld flags=(complain) { #include <abstractions/base> #include <abstractions/nameservice> #include <abstractions/user-tmp> #include <abstractions/mysql> #include <abstractions/winbind> capability dac_override, capability sys_resource, capability setgid, capability setuid, network tcp, /etc/hosts.allow r, /etc/hosts.deny r, /etc/mysql/*.pem r, /etc/mysql/conf.d/ r, /etc/mysql/conf.d/* r, /etc/mysql/*.cnf r, /usr/lib/mysql/plugin/ r, /usr/lib/mysql/plugin/*.so* mr, /usr/sbin/mysqld mr, /usr/share/mysql/** r, /var/log/mysql.log rw, /var/log/mysql.err rw, /var/lib/mysql/ r, /var/lib/mysql/** rwk, /var/log/mysql/ r, /var/log/mysql/* rw, /{,var/}run/mysqld/mysqld.pid w, /{,var/}run/mysqld/mysqld.sock w, /srv/mysql/ r, /srv/mysql/** rwk, /sys/devices/system/cpu/ r, # Site-specific additions and overrides. See local/README for details. #include <local/usr.sbin.mysqld> } Any suggestions? UPD1: $ touch /srv/mysql/sockets/mysql-myDB.sock $ sudo chown mysql:mysql /srv/mysql/sockets/mysql-myDB.sock $ ls -l /srv/mysql/sockets/mysql-myDB.sock -rw-rw-r-- 1 mysql mysql 0 2012-02-22 14:29 /srv/mysql/sockets/mysql-myDB.sock $ sudo /usr/bin/mysqld_safe --datadir=/srv/mysql/myDB --log-error=/srv/mysql/logs/mysqld-myDB.log --pid-file=/srv/mysql/pids/mysqld-myDB.pid --user=mysql --socket=/srv/mysql/sockets/mysql-myDB.sock --port=3700 120222 14:30:18 mysqld_safe Can't log to error log and syslog at the same time. Remove all --log-error configuration options for --syslog to take effect. 120222 14:30:18 mysqld_safe Logging to '/srv/mysql/logs/mysqld-myDB.log'. 120222 14:30:18 mysqld_safe Starting mysqld daemon with databases from /srv/mysqlmyDB 120222 14:30:24 mysqld_safe mysqld from pid file /srv/mysql/pids/mysqld-myDB.pid ended $ ls -l /srv/mysql/sockets/mysql-myDB.sock ls: cannot access /srv/mysql/sockets/mysql-myDB.sock: No such file or directory $ UPD2: $ sudo netstat -lnp | grep mysql tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN 1093/mysqld unix 2 [ ACC ] STREAM LISTENING 5912 1093/mysqld /var/run/mysqld/mysqld.sock $ sudo lsof | grep /srv/mysql/sockets/mysql-myDB.sock lsof: WARNING: can't stat() fuse.gvfs-fuse-daemon file system /home/sears/.gvfs Output information may be incomplete. UPD3: $ cat /etc/mysql/my.cnf # # The MySQL database server configuration file. # # You can copy this to one of: # - "/etc/mysql/my.cnf" to set global options, # - "~/.my.cnf" to set user-specific options. # # One can use all long options that the program supports. # Run program with --help to get a list of available options and with # --print-defaults to see which it would actually understand and use. # # For explanations see # http://dev.mysql.com/doc/mysql/en/server-system-variables.html # This will be passed to all mysql clients # It has been reported that passwords should be enclosed with ticks/quotes # escpecially if they contain "#" chars... # Remember to edit /etc/mysql/debian.cnf when changing the socket location. [client] port = 3306 socket = /var/run/mysqld/mysqld.sock # Here is entries for some specific programs # The following values assume you have at least 32M ram # This was formally known as [safe_mysqld]. Both versions are currently parsed. [mysqld_safe] socket = /var/run/mysqld/mysqld.sock nice = 0 [mysqld] # # * Basic Settings # # # * IMPORTANT # If you make changes to these settings and your system uses apparmor, you may # also need to also adjust /etc/apparmor.d/usr.sbin.mysqld. # user = mysql socket = /var/run/mysqld/mysqld.sock port = 3306 basedir = /usr datadir = /var/lib/mysql tmpdir = /tmp skip-external-locking # # Instead of skip-networking the default is now to listen only on # localhost which is more compatible and is not less secure. #bind-address = 127.0.0.1 # # * Fine Tuning # key_buffer = 16M max_allowed_packet = 16M thread_stack = 192K thread_cache_size = 8 # This replaces the startup script and checks MyISAM tables if needed # the first time they are touched myisam-recover = BACKUP #max_connections = 100 #table_cache = 64 #thread_concurrency = 10 # # * Query Cache Configuration # query_cache_limit = 1M query_cache_size = 16M # # * Logging and Replication # # Both location gets rotated by the cronjob. # Be aware that this log type is a performance killer. # As of 5.1 you can enable the log at runtime! #general_log_file = /var/log/mysql/mysql.log #general_log = 1 log_error = /var/log/mysql/error.log # Here you can see queries with especially long duration #log_slow_queries = /var/log/mysql/mysql-slow.log #long_query_time = 2 #log-queries-not-using-indexes # # The following can be used as easy to replay backup logs or for replication. # note: if you are setting up a replication slave, see README.Debian about # other settings you may need to change. #server-id = 1 #log_bin = /var/log/mysql/mysql-bin.log expire_logs_days = 10 max_binlog_size = 100M #binlog_do_db = include_database_name #binlog_ignore_db = include_database_name # # * InnoDB # # InnoDB is enabled by default with a 10MB datafile in /var/lib/mysql/. # Read the manual for more InnoDB related options. There are many! # # * Security Features # # Read the manual, too, if you want chroot! # chroot = /var/lib/mysql/ # # For generating SSL certificates I recommend the OpenSSL GUI "tinyca". # # ssl-ca=/etc/mysql/cacert.pem # ssl-cert=/etc/mysql/server-cert.pem # ssl-key=/etc/mysql/server-key.pem [mysqldump] quick quote-names max_allowed_packet = 16M [mysql] #no-auto-rehash # faster start of mysql but no tab completition [isamchk] key_buffer = 16M # # * IMPORTANT: Additional settings that can override those from this file! # The files must end with '.cnf', otherwise they'll be ignored. # !includedir /etc/mysql/conf.d/

Read the article
How to find and fix issue with Pound and HAProxy

- by javano

Pound sits in front of HAProxy (on the same box) to perform SSL off-load. Requests are passed to 127.0.0.1:80 where HAProxy then balances the requests across backend servers for a hosted ASP .NET web app. A user is getting HTTP error 500 (Internal Server Error) returned to their browser this morning and I can see it is comming from Pound. They see no log entry in their web app (IIS) server logs, so its not hitting the back end servers. I think the problem is possibly with HAProxy. Lets review the logs: Initialy the users (1.2.3.4) hits Pound on the load balancer: Nov 12 10:02:24 lb1 pound: a-website.com 1.2.3.4 - - [12/Nov/2012:10:02:23 +0000] "POST /eventmanagement/EditEvent.aspx?eventOid=623fc423-2329-4cab-8be5-72a97709570d HTTP/1.1" 200 155721 "https://a-website.com/eventmanagement/EditEvent.aspx?eventOid=623fc423-2329-4cab-8be5-72a97709570d" "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/22.0.1229.96 Safari/537.4" Nov 12 10:02:24 lb1 pound: a-website.com 1.2.3.4 - - [12/Nov/2012:10:02:24 +0000] "GET /Controls/ReferringOrganisationLogoImageHandler.ashx HTTP/1.1" 200 142 "https://a-website.com/eventmanagement/EditEvent.aspx?eventOid=623fc423-2329-4cab-8be5-72a97709570d" "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/22.0.1229.96 Safari/537.4" Nov 12 10:02:24 lb1 pound: a-website.com 1.2.3.4 - - [12/Nov/2012:10:02:24 +0000] "GET /eventmanagement/WebCoreModule.ashx?__ac=1&__ac_wcmid=RAWCIL&__ac_lib=Radactive.WebControls.ILoad&__ac_key=RAWVCO_11&__ac_sid=fnoz2hmvirfivb2btbubbw45&__ac_cn=&__ac_cp=BVDXDWFLDWFMHDFJBOEGBDFLFOD5EEFD&__ac_fr=634883113445054092&__ac_ssid= HTTP/1.1" 200 11206 "https://a-website.com/eventmanagement/EditEvent.aspx?eventOid=623fc423-2329-4cab-8be5-72a97709570d" "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/22.0.1229.96 Safari/537.4" Nov 12 10:02:24 lb1 pound: a-website.com 1.2.3.4 - - [12/Nov/2012:10:02:24 +0000] "GET /eventmanagement/WebCoreModule.ashx?__ac=1&__ac_wcmid=RAWCIL&__ac_lib=Radactive.WebControls.ILoad&__ac_key=RAWCCIL_11&__ac_sid=fnoz2hmvirfivb2btbubbw45&__ac_cn=&__ac_cp=BVDXDWFLDWFMHDFJBOEGBDFLFOD5EEFD&__ac_fr=634883113445054092 HTTP/1.1" 200 43496 "https://a-website.com/eventmanagement/EditEvent.aspx?eventOid=623fc423-2329-4cab-8be5-72a97709570d" "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/22.0.1229.96 Safari/537.4" Nov 12 10:02:42 lb1 pound: (7f819fff8700) e500 for 1.2.3.4 response error read from 127.0.0.1:80/POST /eventmanagement/EditEvent.aspx?eventOid=623fc423-2329-4cab-8be5-72a97709570d HTTP/1.1: Connection timed out (15.121 secs) Above we can see the request comming in from the user at IP address 1.2.3.4, eventually Pound returns error 500 with the message "Connection timed out (15.121 secs)". Running HAProxy in debug mode, we can see the request come in; user@box:/var/log$ sudo /etc/init.d/haproxy restart Restarting haproxy: haproxy[WARNING] 316/100042 (19218) : <debug> mode incompatible with <quiet> and <daemon>. Keeping <debug> only. Available polling systems : sepoll : pref=400, test result OK epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OK Total: 4 (4 usable), will use sepoll. Using sepoll() as the polling mechanism. ....... 00000008:iis-servers.srvrep[0008:0009]: HTTP/1.1 200 OK 00000008:iis-servers.srvhdr[0008:0009]: Cache-Control: private 00000008:iis-servers.srvhdr[0008:0009]: Pragma: no-cache 00000008:iis-servers.srvhdr[0008:0009]: Content-Length: 22211 00000008:iis-servers.srvhdr[0008:0009]: Content-Type: text/plain; charset=utf-8 00000008:iis-servers.srvhdr[0008:0009]: Server: Microsoft-IIS/7.0 00000008:iis-servers.srvhdr[0008:0009]: X-AspNet-Version: 2.0.50727 00000008:iis-servers.srvhdr[0008:0009]: X-Powered-By: ASP.NET 00000008:iis-servers.srvhdr[0008:0009]: Date: Mon, 12 Nov 2012 10:01:25 GMT 00000009:iis-servers.accept(0004)=000a from [127.0.0.1:53556] 00000009:iis-servers.clireq[000a:ffff]: GET /Logoff.aspx HTTP/1.1 00000009:iis-servers.clihdr[000a:ffff]: Host: a-website.com 00000009:iis-servers.clihdr[000a:ffff]: Connection: keep-alive 00000009:iis-servers.clihdr[000a:ffff]: User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/22.0.1229.96 Safari/537.4 00000009:iis-servers.clihdr[000a:ffff]: Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 00000009:iis-servers.clihdr[000a:ffff]: Referer: https://a-website.com/eventmanagement/eventmanagement.aspx 00000009:iis-servers.clihdr[000a:ffff]: Accept-Encoding: gzip,deflate,sdch 00000009:iis-servers.clihdr[000a:ffff]: Accept-Language: en-GB,en;q=0.8,it;q=0.6 00000009:iis-servers.clihdr[000a:ffff]: Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3 00000009:iis-servers.clihdr[000a:ffff]: Cookie: ASP.NET_SessionId=fnoz2hmvirfivb2btbubbw45; apps=apps2; AuthHint=true; __utma=190546871.552451749.1340295610.1352454675.1352711624.159; __utmb=190546871.2.10.1352711624; __utmc=190546871; __utmz=190546871.1349966519.143.3.utmcsr=en.wikipedia.org|utmccn=(referral)|utmcmd=referral|utmcct=/wiki/Single_transferable_vote; Sequence=162; SessionId=80e603f9-7e73-474b-8b7c-e198b2f11218; SecureSessionId=00000000-0000-0000-0000-000000000000; __utma=58336506.1016936529.1332752550.1352454680.1352711626.456; __utmb=58336506.28.10.1352711626; __utmc=58336506; __utmz=58336506.1352711626.456.155.utmcsr=a-website.com|utmccn=(referral)|utmcmd=referral|utmcct=/ 00000009:iis-servers.clihdr[000a:ffff]: X-SSL-cipher: RC4-SHA SSLv3 Kx=RSA Au=RSA Enc=RC4(128) Mac=SHA1 00000009:iis-servers.clihdr[000a:ffff]: X-Forwarded-For: 1.2.3.4 00000008:iis-servers.srvcls[0008:0009] 00000008:iis-servers.clicls[0008:0009] 00000008:iis-servers.closed[0008:0009] ....... 0000000e:iis-servers.srvrep[0008:0009]: HTTP/1.1 200 OK 0000000e:iis-servers.srvhdr[0008:0009]: Cache-Control: no-cache 0000000e:iis-servers.srvhdr[0008:0009]: Pragma: no-cache 0000000e:iis-servers.srvhdr[0008:0009]: Content-Length: 12805 0000000e:iis-servers.srvhdr[0008:0009]: Content-Type: text/html; charset=utf-8 0000000e:iis-servers.srvhdr[0008:0009]: Server: Microsoft-IIS/7.0 0000000e:iis-servers.srvhdr[0008:0009]: X-AspNet-Version: 2.0.50727 0000000e:iis-servers.srvhdr[0008:0009]: X-Powered-By: ASP.NET 0000000e:iis-servers.srvhdr[0008:0009]: Date: Mon, 12 Nov 2012 10:02:22 GMT 0000000f:iis-servers.accept(0004)=000c from [127.0.0.1:53609] 0000000f:iis-servers.clireq[000c:ffff]: GET /Controls/ReferringOrganisationLogoImageHandler.ashx HTTP/1.1 0000000f:iis-servers.clihdr[000c:ffff]: Host: a-website.com 0000000f:iis-servers.clihdr[000c:ffff]: Connection: keep-alive 0000000f:iis-servers.clihdr[000c:ffff]: User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/22.0.1229.96 Safari/537.4 0000000f:iis-servers.clihdr[000c:ffff]: Accept: */* 0000000f:iis-servers.clihdr[000c:ffff]: Referer: https://a-website.com/eventmanagement/EditEvent.aspx?eventOid=623fc423-2329-4cab-8be5-72a97709570d 0000000f:iis-servers.clihdr[000c:ffff]: Accept-Encoding: gzip,deflate,sdch 0000000f:iis-servers.clihdr[000c:ffff]: Accept-Language: en-GB,en;q=0.8,it;q=0.6 0000000f:iis-servers.clihdr[000c:ffff]: Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3 0000000f:iis-servers.clihdr[000c:ffff]: Cookie: ASP.NET_SessionId=fnoz2hmvirfivb2btbubbw45; apps=apps2; __utma=190546871.552451749.1340295610.1352454675.1352711624.159; __utmb=190546871.2.10.1352711624; __utmc=190546871; __utmz=190546871.1349966519.143.3.utmcsr=en.wikipedia.org|utmccn=(referral)|utmcmd=referral|utmcct=/wiki/Single_transferable_vote; AuthHint=true; __utma=58336506.1016936529.1332752550.1352454680.1352711626.456; __utmb=58336506.33.10.1352711626; __utmc=58336506; __utmz=58336506.1352711626.456.155.utmcsr=a-website.com|utmccn=(referral)|utmcmd=referral|utmcct=/; SessionId=69cd415c-2f4e-4ace-b8f7-926d054f87c2; SecureSessionId=00000000-0000-0000-0000-000000000000; Sequence=170 0000000f:iis-servers.clihdr[000c:ffff]: X-SSL-cipher: RC4-SHA SSLv3 Kx=RSA Au=RSA Enc=RC4(128) Mac=SHA1 0000000f:iis-servers.clihdr[000c:ffff]: X-Forwarded-For: 1.2.3.4 0000000f:iis-servers.srvrep[000c:000d]: HTTP/1.1 200 OK 0000000f:iis-servers.srvhdr[000c:000d]: Cache-Control: private 0000000f:iis-servers.srvhdr[000c:000d]: Content-Length: 142 0000000f:iis-servers.srvhdr[000c:000d]: Content-Type: image/png 0000000f:iis-servers.srvhdr[000c:000d]: Server: Microsoft-IIS/7.0 0000000f:iis-servers.srvhdr[000c:000d]: X-AspNet-Version: 2.0.50727 0000000f:iis-servers.srvhdr[000c:000d]: Set-Cookie: SessionId=69cd415c-2f4e-4ace-b8f7-926d054f87c2; path=/ 0000000f:iis-servers.srvhdr[000c:000d]: Set-Cookie: SecureSessionId=00000000-0000-0000-0000-000000000000; path=/; secure 0000000f:iis-servers.srvhdr[000c:000d]: X-Powered-By: ASP.NET 0000000f:iis-servers.srvhdr[000c:000d]: Date: Mon, 12 Nov 2012 10:02:25 GMT 0000000e:iis-servers.srvcls[0008:0009] 0000000e:iis-servers.clicls[0008:0009] 0000000e:iis-servers.closed[0008:0009] 0000000f:iis-servers.srvcls[000c:000d] 0000000f:iis-servers.clicls[000c:000d] 0000000f:iis-servers.closed[000c:000d] 00000009:iis-servers.srvcls[000a:000b] 00000009:iis-servers.clicls[000a:000b] 00000009:iis-servers.closed[000a:000b] Where in the chain is the issue here?

Read the article
Mysql server crashes Innodb

- by martin

Today we got some DB crash. The DB is InnoDB. At firstin log: 120404 10:57:40 InnoDB: ERROR: the age of the last checkpoint is 9433732, InnoDB: which exceeds the log group capacity 9433498. InnoDB: If you are using big BLOB or TEXT rows, you must set the InnoDB: combined size of log files at least 10 times bigger than the InnoDB: largest such row. 120404 10:58:48 InnoDB: ERROR: the age of the last checkpoint is 9825579, InnoDB: which exceeds the log group capacity 9433498. InnoDB: If you are using big BLOB or TEXT rows, you must set the InnoDB: combined size of log files at least 10 times bigger than the InnoDB: largest such row. 120404 10:59:04 InnoDB: ERROR: the age of the last checkpoint is 13992586, InnoDB: which exceeds the log group capacity 9433498. InnoDB: If you are using big BLOB or TEXT rows, you must set the InnoDB: combined size of log files at least 10 times bigger than the InnoDB: largest such row. 120404 10:59:20 InnoDB: ERROR: the age of the last checkpoint is 18059881, InnoDB: which exceeds the log group capacity 9433498. InnoDB: If you are using big BLOB or TEXT rows, you must set the InnoDB: combined size of log files at least 10 times bigger than the InnoDB: largest such row. after manual service stop and normal PC restart : 120404 11:12:35 InnoDB: Error: page 3473451 log sequence number 105 802365904 InnoDB: is in the future! Current system log sequence number 105 796344770. InnoDB: Your database may be corrupt or you may have copied the InnoDB InnoDB: tablespace but not the InnoDB log files. See InnoDB: http://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.html InnoDB: for more information. InnoDB: 1 transaction(s) which must be rolled back or cleaned up InnoDB: in total 1 row operations to undo InnoDB: Trx id counter is 0 1103869440 120404 11:12:37 InnoDB: Error: page 0 log sequence number 105 834817616 InnoDB: is in the future! Current system log sequence number 105 796344770. InnoDB: Your database may be corrupt or you may have copied the InnoDB InnoDB: tablespace but not the InnoDB log files. See InnoDB: http://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.html InnoDB: for more information. InnoDB: Last MySQL binlog file position 0 3710603, file name .\mysql-bin.000336 InnoDB: Starting in background the rollback of uncommitted transactions 120404 11:12:38 InnoDB: Rolling back trx with id 0 1103866646, 1 rows to undo 120404 11:12:38 InnoDB: Started; log sequence number 105 796344770 120404 11:12:38 InnoDB: Error: page 2097163 log sequence number 105 803249754 InnoDB: is in the future! Current system log sequence number 105 796344770. InnoDB: Your database may be corrupt or you may have copied the InnoDB InnoDB: tablespace but not the InnoDB log files. See InnoDB: http://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.html InnoDB: for more information. InnoDB: Rolling back of trx id 0 1103866646 completed 120404 11:12:39 InnoDB: Rollback of non-prepared transactions completed 120404 11:12:39 [Note] Event Scheduler: Loaded 0 events 120404 11:12:39 [Note] wampmysqld: ready for connections. Version: '5.1.53-community' socket: '' port: 3306 MySQL Community Server (GPL) 120404 11:12:40 InnoDB: Error: page 2097162 log sequence number 105 803215859 InnoDB: is in the future! Current system log sequence number 105 796345097. InnoDB: Your database may be corrupt or you may have copied the InnoDB InnoDB: tablespace but not the InnoDB log files. See InnoDB: http://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.html InnoDB: for more information. 120404 11:12:40 InnoDB: Error: page 2097156 log sequence number 105 803181181 InnoDB: is in the future! Current system log sequence number 105 796345097. InnoDB: Your database may be corrupt or you may have copied the InnoDB InnoDB: tablespace but not the InnoDB log files. See InnoDB: http://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.html InnoDB: for more information. 120404 11:12:40 InnoDB: Error: page 2097157 log sequence number 105 803193066 InnoDB: is in the future! Current system log sequence number 105 796345097. InnoDB: Your database may be corrupt or you may have copied the InnoDB InnoDB: tablespace but not the InnoDB log files. See InnoDB: http://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.html InnoDB: for more information. when tried to recover data get : key_buffer_size=16777216 read_buffer_size=262144 max_used_connections=0 max_threads=151 threads_connected=0 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 133725 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. thd: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... 0000000140262AFC mysqld.exe!?check_next_symbol@Gis_read_stream@@QEAA_ND@Z() 00000001402AAFA1 mysqld.exe!?check_next_symbol@Gis_read_stream@@QEAA_ND@Z() 00000001402AB33A mysqld.exe!?check_next_symbol@Gis_read_stream@@QEAA_ND@Z() 0000000140268219 mysqld.exe!?check_next_symbol@Gis_read_stream@@QEAA_ND@Z() 000000014027DB13 mysqld.exe!?check_next_symbol@Gis_read_stream@@QEAA_ND@Z() 00000001402A909F mysqld.exe!?check_next_symbol@Gis_read_stream@@QEAA_ND@Z() 00000001402A91B6 mysqld.exe!?check_next_symbol@Gis_read_stream@@QEAA_ND@Z() 000000014025B9B0 mysqld.exe!?check_next_symbol@Gis_read_stream@@QEAA_ND@Z() 000000014022F9C6 mysqld.exe!?check_next_symbol@Gis_read_stream@@QEAA_ND@Z() 0000000140219979 mysqld.exe!?check_next_symbol@Gis_read_stream@@QEAA_ND@Z() 000000014009ABCF mysqld.exe!?ha_initialize_handlerton@@YAHPEAUst_plugin_int@@@Z() 000000014003308C mysqld.exe!?plugin_lock_by_name@@YAPEAUst_plugin_int@@PEAVTHD@@PEBUst_mysql_lex_string@@H@Z() 00000001400375A9 mysqld.exe!?plugin_init@@YAHPEAHPEAPEADH@Z() 000000014001DACE mysqld.exe!handle_shutdown() 000000014001E285 mysqld.exe!?win_main@@YAHHPEAPEAD@Z() 000000014001E632 mysqld.exe!?mysql_service@@YAHPEAX@Z() 00000001402EA477 mysqld.exe!?check_next_symbol@Gis_read_stream@@QEAA_ND@Z() 00000001402EA545 mysqld.exe!?check_next_symbol@Gis_read_stream@@QEAA_ND@Z() 000000007712652D kernel32.dll!BaseThreadInitThunk() 000000007725C521 ntdll.dll!RtlUserThreadStart() The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. 120404 14:17:49 [Note] Plugin 'FEDERATED' is disabled. 120404 14:17:49 [Warning] option 'innodb-force-recovery': signed value 8 adjusted to 6 InnoDB: The user has set SRV_FORCE_NO_LOG_REDO on InnoDB: Skipping log redo InnoDB: Error: trying to access page number 4290979199 in space 0, InnoDB: space name .\ibdata1, InnoDB: which is outside the tablespace bounds. InnoDB: Byte offset 0, len 16384, i/o type 10. InnoDB: If you get this error at mysqld startup, please check that InnoDB: your my.cnf matches the ibdata files that you have in the InnoDB: MySQL server. 120404 14:17:52 InnoDB: Assertion failure in thread 3928 in file .\fil\fil0fil.c lin23 InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to http://bugs.mysql.com. InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: http://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.html InnoDB: about forcing recovery. 120404 14:17:52 - mysqld got exception 0xc0000005 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=16777216 read_buffer_size=262144 max_used_connections=0 max_threads=151 threads_connected=0 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 133725 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. thd: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... 0000000140262AFC mysqld.exe!?check_next_symbol@Gis_read_stream@@QEAA_ND@Z() 00000001402AAFA1 mysqld.exe!?check_next_symbol@Gis_read_stream@@QEAA_ND@Z() 00000001402AB33A mysqld.exe!?check_next_symbol@Gis_read_stream@@QEAA_ND@Z() 0000000140268219 mysqld.exe!?check_next_symbol@Gis_read_stream@@QEAA_ND@Z() 000000014027DB13 mysqld.exe!?check_next_symbol@Gis_read_stream@@QEAA_ND@Z() 00000001402A909F mysqld.exe!?check_next_symbol@Gis_read_stream@@QEAA_ND@Z() 00000001402A91B6 mysqld.exe!?check_next_symbol@Gis_read_stream@@QEAA_ND@Z() 000000014025B9B0 mysqld.exe!?check_next_symbol@Gis_read_stream@@QEAA_ND@Z() 000000014022F9C6 mysqld.exe!?check_next_symbol@Gis_read_stream@@QEAA_ND@Z() 0000000140219979 mysqld.exe!?check_next_symbol@Gis_read_stream@@QEAA_ND@Z() 000000014009ABCF mysqld.exe!?ha_initialize_handlerton@@YAHPEAUst_plugin_int@@@Z() 000000014003308C mysqld.exe!?plugin_lock_by_name@@YAPEAUst_plugin_int@@PEAVTHD@@PEBUst_mysql_lex_string@@H@Z() 00000001400375A9 mysqld.exe!?plugin_init@@YAHPEAHPEAPEADH@Z() 000000014001DACE mysqld.exe!handle_shutdown() 000000014001E285 mysqld.exe!?win_main@@YAHHPEAPEAD@Z() 000000014001E632 mysqld.exe!?mysql_service@@YAHPEAX@Z() 00000001402EA477 mysqld.exe!?check_next_symbol@Gis_read_stream@@QEAA_ND@Z() 00000001402EA545 mysqld.exe!?check_next_symbol@Gis_read_stream@@QEAA_ND@Z() 000000007712652D kernel32.dll!BaseThreadInitThunk() 000000007725C521 ntdll.dll!RtlUserThreadStart() The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. Any suggestion how to get DB working ????

Read the article
Failing Sata HDD

- by DaveCol

I think my HDD is fried... Could someone confirm or help me restore it? I was using Hardware RAID 1 Configuration [2 x 160GB SATA HDD] on a CentOS 4 Installation. All of a sudden I started seeing bad sectors on the second HDD which stopped being mirrored. I have removed the RAID array and have tested with SMART which showed the following error: 187 Unknown_Attribute 0x003a 001 001 051 Old_age Always FAILING_NOW 4645 I have no clue what this means, or if I can recover from it. Could someone give me some ideas on how to fix this, or what HDD to get to replace this? Complete SMART report: Smartctl version 5.33 [i686-redhat-linux-gnu] Copyright (C) 2002-4 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: GB0160CAABV Serial Number: 6RX58NAA Firmware Version: HPG1 User Capacity: 160,041,885,696 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 7 ATA Standard is: ATA/ATAPI-7 T13 1532D revision 4a Local Time is: Tue Oct 19 13:42:42 2010 COT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED See vendor-specific Attribute list for marginal Attributes. General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 433) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 54) minutes. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 100 253 006 Pre-fail Always - 0 3 Spin_Up_Time 0x0002 097 097 000 Old_age Always - 0 4 Start_Stop_Count 0x0033 100 100 020 Pre-fail Always - 152 5 Reallocated_Sector_Ct 0x0033 095 095 036 Pre-fail Always - 214 7 Seek_Error_Rate 0x000f 078 060 030 Pre-fail Always - 73109713 9 Power_On_Hours 0x0032 083 083 000 Old_age Always - 15133 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0033 100 100 020 Pre-fail Always - 154 184 Unknown_Attribute 0x0032 038 038 000 Old_age Always - 62 187 Unknown_Attribute 0x003a 001 001 051 Old_age Always FAILING_NOW 4645 189 Unknown_Attribute 0x0022 100 100 000 Old_age Always - 0 190 Unknown_Attribute 0x001a 061 055 000 Old_age Always - 656408615 194 Temperature_Celsius 0x0000 039 045 000 Old_age Offline - 39 (Lifetime Min/Max 0/22) 195 Hardware_ECC_Recovered 0x0032 070 059 000 Old_age Always - 12605265 197 Current_Pending_Sector 0x0000 100 100 000 Old_age Offline - 1 198 Offline_Uncorrectable 0x0000 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0000 200 200 000 Old_age Offline - 62 SMART Error Log Version: 1 ATA Error Count: 4645 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 4645 occurred at disk power-on lifetime: 15132 hours (630 days + 12 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 7b 86 b1 ea Error: UNC at LBA = 0x0ab1867b = 179406459 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 02 7b 86 b1 ea 00 00:38:52.796 READ DMA ec 03 45 00 00 00 a0 00 00:38:52.796 IDENTIFY DEVICE ef 03 45 00 00 00 a0 00 00:38:52.794 SET FEATURES [Set transfer mode] ec 00 00 7b 86 b1 a0 00 00:38:49.991 IDENTIFY DEVICE c8 00 04 79 86 b1 ea 00 00:38:49.935 READ DMA Error 4644 occurred at disk power-on lifetime: 15132 hours (630 days + 12 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 7b 86 b1 ea Error: UNC at LBA = 0x0ab1867b = 179406459 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 04 79 86 b1 ea 00 00:38:41.517 READ DMA ec 03 45 00 00 00 a0 00 00:38:41.515 IDENTIFY DEVICE ef 03 45 00 00 00 a0 00 00:38:41.515 SET FEATURES [Set transfer mode] ec 00 00 7b 86 b1 a0 00 00:38:49.991 IDENTIFY DEVICE c8 00 06 77 86 b1 ea 00 00:38:49.935 READ DMA Error 4643 occurred at disk power-on lifetime: 15132 hours (630 days + 12 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 7b 86 b1 ea Error: UNC at LBA = 0x0ab1867b = 179406459 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 06 77 86 b1 ea 00 00:38:41.517 READ DMA ec 03 45 00 00 00 a0 00 00:38:41.515 IDENTIFY DEVICE ef 03 45 00 00 00 a0 00 00:38:41.515 SET FEATURES [Set transfer mode] ec 00 00 7b 86 b1 a0 00 00:38:41.513 IDENTIFY DEVICE c8 00 06 77 86 b1 ea 00 00:38:38.706 READ DMA Error 4642 occurred at disk power-on lifetime: 15132 hours (630 days + 12 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 7b 86 b1 ea Error: UNC at LBA = 0x0ab1867b = 179406459 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 06 77 86 b1 ea 00 00:38:41.517 READ DMA ec 03 45 00 00 00 a0 00 00:38:41.515 IDENTIFY DEVICE ef 03 45 00 00 00 a0 00 00:38:41.515 SET FEATURES [Set transfer mode] ec 00 00 7b 86 b1 a0 00 00:38:41.513 IDENTIFY DEVICE c8 00 06 77 86 b1 ea 00 00:38:38.706 READ DMA Error 4641 occurred at disk power-on lifetime: 15132 hours (630 days + 12 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 7b 86 b1 ea Error: UNC at LBA = 0x0ab1867b = 179406459 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 06 77 86 b1 ea 00 00:38:41.517 READ DMA ec 03 45 00 00 00 a0 00 00:38:41.515 IDENTIFY DEVICE ef 03 45 00 00 00 a0 00 00:38:41.515 SET FEATURES [Set transfer mode] ec 00 00 7b 86 b1 a0 00 00:38:41.513 IDENTIFY DEVICE c8 00 06 77 86 b1 ea 00 00:38:38.706 READ DMA SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 15131 - # 2 Short offline Completed without error 00% 15131 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.

Read the article
2-Bay External HDD Enclosure in JBOD mode fails to detect both drives (Linux & Windows)

- by mgc8888

I recently purchased a couple of USB 3.0 External HDD Enclosures to use for storage and backup; the idea was to have one act as backup to the other, with 4 x 3TB drives in total. However, the second drive in each is not accessible in either Linux nor Windows, and I could not determine the reason. 1. Situation The two enclosures are slightly different (couldn't find them in stock at the same time) yet from many little details appear to be the same Chinese base design with a tweaked outer shell. The models are: Sharkoon 2-Bay RAID Box Fantec MR-35DU3 The drives are Seagate 3TB Barracuda ST33000651AS, firmware CC44, all identical. From reading manuals and online sources, I determined that JBOD would be the optimal setup for my needs -- addressing the two drives separately in each enclosure would be important, making it easy to swap drives and mix&match them if needed; all the other modes implied the controller doing a combination of the drives. The software used was Debian GNU/Linux - testing/wheezy - kernel 2.6.39-2 and Windows 7 Ultimate. 2. Description of the problem Now, here comes the problem: every time I connect either of the enclosures to a PC using the supplied cable (tried a different one as well), only the HDD in the top bay is readable, the one below is detected yet errors out in various ways. According to the manuals, it should not happen: in JBOD, the system should be able to "see" two separate drives upon connection. This happens with both enclosures and any combination of HDDs (i.e. if I swap them, the same thing happens), so the HDDs are good and I think so are the enclosures (two different companies making similar products that failed in an identical fashion would be very unlikely). The top HDD can be used fine every time, I actually tried a speed test from Linux and got about 150MiB/s reads, so all is working as it should; the one below refuses to work every time. So the failure is consistent. To make sure this was not some obscure Linux bug, I tried the same under Windows 7, and the system also only created one drive letter for a drive of 3TB size (so it was only seeing one instead of both). Placing an older, known good, 2TB drive in the top bay made that the one recognised, so we have the same issue under Windows as well. Log entries under Linux (tested here with a 3TB and a 2TB drive so I could differentiate them; either one works in the top enclosure, in the test setup the 3TB one is on top). You can see them being detected, the top one is ok, but for the bottom one only errors: Jul 19 23:28:15 media kernel: [260150.582436] usb 6-1: New USB device found, idVendor=1ca1, idProduct=18ae Jul 19 23:28:15 media kernel: [260150.582440] usb 6-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3 Jul 19 23:28:15 media kernel: [260150.582442] usb 6-1: Product: Usb Sata Bridge Jul 19 23:28:15 media kernel: [260150.582444] usb 6-1: Manufacturer: SYMWAVE Jul 19 23:28:15 media kernel: [260150.582446] usb 6-1: SerialNumber: 39584B304C4E3441 Jul 19 23:28:15 media kernel: [260150.870412] scsi11 : usb-storage 6-1:1.0 Jul 19 23:28:16 media kernel: [260151.882087] scsi 11:0:0:0: Direct-Access SYMWAVE ST33000651AS CC44 PQ: 0 ANSI: 4 Jul 19 23:28:16 media kernel: [260151.882242] scsi 11:0:0:1: Direct-Access SYMWAVE ST32000641AS CC12 PQ: 0 ANSI: 4 Jul 19 23:28:16 media kernel: [260151.882677] sd 11:0:0:0: Attached scsi generic sg2 type 0 Jul 19 23:28:16 media kernel: [260151.882774] sd 11:0:0:0: [sdb] Very big device. Trying to use READ CAPACITY(16). Jul 19 23:28:16 media kernel: [260151.882857] sd 11:0:0:1: Attached scsi generic sg3 type 0 Jul 19 23:28:16 media kernel: [260151.882893] sd 11:0:0:0: [sdb] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB) Jul 19 23:28:16 media kernel: [260151.883085] xhci_hcd 0000:03:00.0: WARN: Stalled endpoint Jul 19 23:28:16 media kernel: [260151.883582] sd 11:0:0:0: [sdb] Write Protect is off Jul 19 23:28:16 media kernel: [260151.883961] sd 11:0:0:1: [sdc] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB) Jul 19 23:28:16 media kernel: [260151.884145] xhci_hcd 0000:03:00.0: WARN: Stalled endpoint Jul 19 23:28:16 media kernel: [260151.884570] sd 11:0:0:1: [sdc] Write Protect is off Jul 19 23:28:16 media kernel: [260151.884855] sd 11:0:0:0: [sdb] Very big device. Trying to use READ CAPACITY(16). Jul 19 23:28:16 media kernel: [260151.885286] xhci_hcd 0000:03:00.0: WARN: Stalled endpoint Jul 19 23:28:16 media kernel: [260151.885807] xhci_hcd 0000:03:00.0: WARN: Stalled endpoint Jul 19 23:28:16 media kernel: [260151.909595] xhci_hcd 0000:03:00.0: WARN: Stalled endpoint Jul 19 23:28:16 media kernel: [260151.910159] sd 11:0:0:1: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jul 19 23:28:16 media kernel: [260151.910163] sd 11:0:0:1: [sdc] Sense Key : Illegal Request [current] Jul 19 23:28:16 media kernel: [260151.910167] Info fld=0x0 Jul 19 23:28:16 media kernel: [260151.910169] sd 11:0:0:1: [sdc] Add. Sense: Invalid field in cdb Jul 19 23:28:16 media kernel: [260151.910172] sd 11:0:0:1: [sdc] CDB: Read(10): 28 20 00 00 00 00 00 00 08 00 Jul 19 23:28:16 media kernel: [260151.910182] quiet_error: 2 callbacks suppressed Jul 19 23:28:16 media kernel: [260151.910570] xhci_hcd 0000:03:00.0: WARN: Stalled endpoint Jul 19 23:28:16 media kernel: [260151.911153] sd 11:0:0:1: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jul 19 23:28:16 media kernel: [260151.911156] sd 11:0:0:1: [sdc] Sense Key : Illegal Request [current] Jul 19 23:28:16 media kernel: [260151.911159] Info fld=0x0 Jul 19 23:28:16 media kernel: [260151.911161] sd 11:0:0:1: [sdc] Add. Sense: Invalid field in cdb Jul 19 23:28:16 media kernel: [260151.911164] sd 11:0:0:1: [sdc] CDB: Read(10): 28 20 00 00 00 00 00 00 08 00 Jul 19 23:28:16 media kernel: [260151.911385] xhci_hcd 0000:03:00.0: WARN: Stalled endpoint Jul 19 23:28:16 media kernel: [260151.911902] sd 11:0:0:1: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jul 19 23:28:16 media kernel: [260151.911905] sd 11:0:0:1: [sdc] Sense Key : Illegal Request [current] Jul 19 23:28:16 media kernel: [260151.911908] Info fld=0x0 Jul 19 23:28:16 media kernel: [260151.911910] sd 11:0:0:1: [sdc] Add. Sense: Invalid field in cdb Jul 19 23:28:16 media kernel: [260151.911913] sd 11:0:0:1: [sdc] CDB: Read(10): 28 20 00 00 00 00 00 00 08 00 Jul 19 23:28:16 media kernel: [260151.912128] xhci_hcd 0000:03:00.0: WARN: Stalled endpoint Jul 19 23:28:16 media kernel: [260151.912650] sd 11:0:0:1: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jul 19 23:28:16 media kernel: [260151.912653] sd 11:0:0:1: [sdc] Sense Key : Illegal Request [current] Jul 19 23:28:16 media kernel: [260151.912656] Info fld=0x0 Jul 19 23:28:16 media kernel: [260151.912657] sd 11:0:0:1: [sdc] Add. Sense: Invalid field in cdb Jul 19 23:28:16 media kernel: [260151.912660] sd 11:0:0:1: [sdc] CDB: Read(10): 28 20 00 00 00 00 00 00 08 00 Jul 19 23:28:16 media kernel: [260151.912876] xhci_hcd 0000:03:00.0: WARN: Stalled endpoint Jul 19 23:28:16 media kernel: [260151.913439] sd 11:0:0:1: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jul 19 23:28:16 media kernel: [260151.913442] sd 11:0:0:1: [sdc] Sense Key : Illegal Request [current] Jul 19 23:28:16 media kernel: [260151.913445] Info fld=0x0 Jul 19 23:28:16 media kernel: [260151.913446] sd 11:0:0:1: [sdc] Add. Sense: Invalid field in cdb Jul 19 23:28:16 media kernel: [260151.913449] sd 11:0:0:1: [sdc] CDB: Read(10): 28 20 00 00 00 00 00 00 08 00 Jul 19 23:28:16 media kernel: [260151.945227] xhci_hcd 0000:03:00.0: WARN: Stalled endpoint Jul 19 23:28:16 media kernel: [260151.945863] sd 11:0:0:1: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jul 19 23:28:16 media kernel: [260151.945866] sd 11:0:0:1: [sdc] Sense Key : Illegal Request [current] Jul 19 23:28:16 media kernel: [260151.945870] Info fld=0x0 Jul 19 23:28:16 media kernel: [260151.945871] sd 11:0:0:1: [sdc] Add. Sense: Invalid field in cdb Jul 19 23:28:16 media kernel: [260151.945875] sd 11:0:0:1: [sdc] CDB: Read(10): 28 20 00 00 00 00 00 00 08 00 (...) and so on for like 10 seconds until it gives up (...) 3. Question So, my question would be: what is causing this? Am I missing something, should I configure things differently, is this a known limitation? Searching online for more information did not yield any useful results... Thank you in advance for any help!

Read the article
Backing up my data causes my server to crash using Symantec Backup Exec 12, or How I Came to Loathe

- by Kyle Noland

I have a Dell PowerEdge 2850 running Windows Server 2003. It is the primary file server for one of my clients. I have another server also running Windows Server 2003 that acts as the core media server for Symantec Backup Exec 12. I recently upgraded from Backup Exec 11d to 12. This upgrade was necessary because we also just upgraded from Exchange 2003 to Exchange 2007. After the upgrade I had to push-install the new version 12 Backup Exec Remote Agents to each of the servers I am backing up (about 6 total). 5 of my servers are doing just fine, faithfully completing backups every night. My file server routinely crashes. Observations: When the server crashes, it does not blue screen, it just locks up completely. Even the mouse is unresponsive. If you leave the server locked up long enough, it will eventually reboot itself and hang on the Windows splash screen. There is absolutely zero useful Event Viewer evidence of a problem. The logs go from routine logging to an Unexplained Shutdown Event the next morning when I have to hard reset the server to get it to boot. 90% of the time the server does not boot cleanly, it hangs on the Windows splash screen. I don't have any light to shed here. When the server hangs all I can do is hard reset it and try again. Even after a successful boot and chkdsk /r operation, if you reboot the machine, you have a 90% chance it won't back up again cleanly. The back story: This server started crashing during nightly backups about a month ago. I tried everything I could think of to troubleshoot the problem and eventually had to give up because I could not keep coming to the office at 4 AM to try to get the server back online. One Friday I got lucky and the server stayed up for its entire full backup. I took this opportunity to restore the full backup to a temporary server I set up and switched all my users to the temporary. Then I reloaded the ailing file server. I kept all my users on the temporary file server for about 3 weeks. I installed the same Backup Exec Remote Agent and Trend Micro A/V client on the temporary server that I was using on the regular file server. During this time, I had absolutely no problems backing up the temporary server. I tested the reloaded file server extensively. I rebooted the server once an hour every day for 3 weeks trying to make it fail. It never did. I felt confident that the reload was the answer to my problems. I moved all of the data from the temporary server back to the regular server. I got 3 nightly backups out of it before it locked up again and started the familiar failure to boot cleanly behavior. This weekend I decided to monitor the file server through the entire backup job. I RDPd into the file server and also into the server running Backup Exec. On the file server I opened the Task Manager so I could view the processes and watch CPU and memory usage. Everything was running smoothly for about 60GB worth of backup. Then I noticed that the byte count of the backup job in Backup Exec had stopped progressing. I looked back over at my RDP session into the file server, and I was getting real time updates about CPU and memory usage still - both nearly 0%, which is unusual. Backups usually hover around 40% usage for the duration of the backup job. Let me reiterate this point: The screen was refreshing and I was getting real time Task Manager updates - until I clicked on the Start menu. The screen went black and the server locked up. In truth, I think the server had already locked up, the video card just hadn't figured it out yet. I went back into my bag of trick: driving to the office and hard reseting the server over and over again when it hangs up at the Windows splash screen. I did this for 2 hours without getting a successful boot. I started panicking because I did not have a decent backup to use to get everything back onto the working temporary file server. Once I exhausted everything I knew to do, I took a deep breath, booted to the Windows Server 2003 CD and performed a repair installation of Windows. The server came back up fine, with all of my data intact. I can now reboot the server at will and it will come back up cleanly. The problem is that I'm afraid as soon as I try to back that data up again I will back at square one. So let me sum things up: Here is what I've done so far to troubleshoot this server: Deleted and recreated the RAID 5 sets. Initialized the drives. Reloaded the server with a fresh Server 2003 install. Confirmed with Dell that I have installed the latest, Dell approved BIOS and NIC drivers. Uninstalled / reinstalled the Backup Exec Remote Agent. Uninstalled the Trend Micro A/V client. Configured the server not to reboot itself after a blue screen so I can see any stop error. I used to think the server was blue screening, but since I enabled this setting I now know that the server just completely locks up. Run chkdsk /r from the Windows Recovery Console. Several errors were found and corrected, but did not help my problem. Help confirm or deny the following assumptions: There are two problems at work here. Why the server is locking up in the first place, and why the server won't boot cleanly after a lockup. This is ultimately a software problem. The server works fine and can be rebooted cleanly all day long - until the first lockup - following a fresh OS load or even a Repair installation. This is not a problem with Backup Exec in general. All of my other servers back up just fine. For the record, all of the other servers run Server 2003, and some of them house more data than the file server in question here. Any help is appreciated. The irony is almost too much to bear. Backing up my data is what is jeopardizing it.

Read the article
Hard Disk DRDY error: is it a crash

- by pranjal

I am using IBM Thinkpad, 1.7GHz, 512 RAM with Linux Mint 9 installed. I have two partitions in addition to root. One of the partitions became read-only yesterday, after which I rebooted my system. It is extremely slow along with DRDY Error : Is my Hard disk crashed ? Error Log while booting. Differences between boot sector and its backup. failed command : READ DMA BMDMA : stat 0X25 ata 1.00 : status : { DRDY ERR } ata 1.00 : status :{ UNC } Buffer I/O error on logical device, logical block 65467 smartctl output for the partition: mint mint # smartctl -a /dev/sda1 smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: TOSHIBA MK4026GAX RoHS Serial Number: X5LY1623T Firmware Version: PA107E User Capacity: 40,007,761,920 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 6 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Thu Feb 17 06:48:25 2011 UTC SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 153) seconds. Offline data collection capabilities: (0x1b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. No Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. No General Purpose Logging support. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 30) minutes. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 100 100 050 Pre-fail Always - 0 2 Throughput_Performance 0x0005 100 100 050 Pre-fail Offline - 0 3 Spin_Up_Time 0x0027 100 100 001 Pre-fail Always - 310 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 3968 5 Reallocated_Sector_Ct 0x0033 100 100 050 Pre-fail Always - 40 7 Seek_Error_Rate 0x000b 100 100 050 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 100 100 050 Pre-fail Offline - 0 9 Power_On_Hours 0x0032 082 082 000 Old_age Always - 7257 10 Spin_Retry_Count 0x0033 179 100 030 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 3484 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 489 193 Load_Cycle_Count 0x0032 064 064 000 Old_age Always - 367150 194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 36 (Lifetime Min/Max 14/57) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 33 197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 82 198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 1 199 UDMA_CRC_Error_Count 0x0032 200 253 000 Old_age Always - 0 220 Disk_Shift 0x0002 100 100 000 Old_age Always - 101 222 Loaded_Hours 0x0032 085 085 000 Old_age Always - 6146 223 Load_Retry_Count 0x0032 100 100 000 Old_age Always - 0 224 Load_Friction 0x0022 100 100 000 Old_age Always - 0 226 Load-in_Time 0x0026 100 100 000 Old_age Always - 227 240 Head_Flying_Hours 0x0001 100 100 001 Pre-fail Offline - 0 SMART Error Log Version: 1 ATA Error Count: 2371 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 2371 occurred at disk power-on lifetime: 7256 hours (302 days + 8 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 05 1a 1b 00 e0 Error: UNC 5 sectors at LBA = 0x00001b1a = 6938 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 05 1a 1b 00 e0 00 00:03:10.061 READ DMA f8 00 00 00 00 00 e0 00 00:03:10.061 READ NATIVE MAX ADDRESS ec 00 00 00 00 00 a0 02 00:03:10.053 IDENTIFY DEVICE ef 03 45 00 00 00 a0 02 00:03:10.053 SET FEATURES [Set transfer mode] f8 00 00 00 00 00 e0 00 00:03:10.053 READ NATIVE MAX ADDRESS Error 2370 occurred at disk power-on lifetime: 7256 hours (302 days + 8 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 05 1a 1b 00 e0 Error: UNC 5 sectors at LBA = 0x00001b1a = 6938 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 05 1a 1b 00 e0 00 00:03:03.328 READ DMA f8 00 00 00 00 00 e0 00 00:03:03.327 READ NATIVE MAX ADDRESS ec 00 00 00 00 00 a0 02 00:03:03.320 IDENTIFY DEVICE ef 03 45 00 00 00 a0 02 00:03:03.319 SET FEATURES [Set transfer mode] f8 00 00 00 00 00 e0 00 00:03:03.319 READ NATIVE MAX ADDRESS Error 2369 occurred at disk power-on lifetime: 7256 hours (302 days + 8 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 05 1a 1b 00 e0 Error: UNC 5 sectors at LBA = 0x00001b1a = 6938 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 05 1a 1b 00 e0 00 00:02:56.582 READ DMA f8 00 00 00 00 00 e0 00 00:02:56.582 READ NATIVE MAX ADDRESS ec 00 00 00 00 00 a0 02 00:02:56.574 IDENTIFY DEVICE ef 03 45 00 00 00 a0 02 00:02:56.574 SET FEATURES [Set transfer mode] f8 00 00 00 00 00 e0 00 00:02:56.574 READ NATIVE MAX ADDRESS Error 2368 occurred at disk power-on lifetime: 7256 hours (302 days + 8 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 05 1a 1b 00 e0 Error: UNC 5 sectors at LBA = 0x00001b1a = 6938 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 05 1a 1b 00 e0 00 00:02:49.809 READ DMA f8 00 00 00 00 00 e0 00 00:02:49.809 READ NATIVE MAX ADDRESS ec 00 00 00 00 00 a0 02 00:02:49.801 IDENTIFY DEVICE ef 03 45 00 00 00 a0 02 00:02:49.801 SET FEATURES [Set transfer mode] f8 00 00 00 00 00 e0 00 00:02:49.801 READ NATIVE MAX ADDRESS Error 2367 occurred at disk power-on lifetime: 7256 hours (302 days + 8 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 05 1a 1b 00 e0 Error: UNC 5 sectors at LBA = 0x00001b1a = 6938 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 05 1a 1b 00 e0 00 00:02:43.056 READ DMA f8 00 00 00 00 00 e0 00 00:02:43.056 READ NATIVE MAX ADDRESS ec 00 00 00 00 00 a0 02 00:02:43.048 IDENTIFY DEVICE ef 03 45 00 00 00 a0 02 00:02:43.048 SET FEATURES [Set transfer mode] f8 00 00 00 00 00 e0 00 00:02:43.047 READ NATIVE MAX ADDRESS SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] Device does not support Selective Self Tests/Logging Do I need to get a new Hard Disk my PC ?

Read the article
Does anyone really understand how HFSC scheduling in Linux/BSD works?

- by Mecki

I read the original SIGCOMM '97 PostScript paper about HFSC, it is very technically, but I understand the basic concept. Instead of giving a linear service curve (as with pretty much every other scheduling algorithm), you can specify a convex or concave service curve and thus it is possible to decouple bandwidth and delay. However, even though this paper mentions to kind of scheduling algorithms being used (real-time and link-share), it always only mentions ONE curve per scheduling class (the decoupling is done by specifying this curve, only one curve is needed for that). Now HFSC has been implemented for BSD (OpenBSD, FreeBSD, etc.) using the ALTQ scheduling framework and it has been implemented Linux using the TC scheduling framework (part of iproute2). Both implementations added two additional service curves, that were NOT in the original paper! A real-time service curve and an upper-limit service curve. Again, please note that the original paper mentions two scheduling algorithms (real-time and link-share), but in that paper both work with one single service curve. There never have been two independent service curves for either one as you currently find in BSD and Linux. Even worse, some version of ALTQ seems to add an additional queue priority to HSFC (there is no such thing as priority in the original paper either). I found several BSD HowTo's mentioning this priority setting (even though the man page of the latest ALTQ release knows no such parameter for HSFC, so officially it does not even exist). This all makes the HFSC scheduling even more complex than the algorithm described in the original paper and there are tons of tutorials on the Internet that often contradict each other, one claiming the opposite of the other one. This is probably the main reason why nobody really seems to understand how HFSC scheduling really works. Before I can ask my questions, we need a sample setup of some kind. I'll use a very simple one as seen in the image below: Here are some questions I cannot answer because the tutorials contradict each other: What for do I need a real-time curve at all? Assuming A1, A2, B1, B2 are all 128 kbit/s link-share (no real-time curve for either one), then each of those will get 128 kbit/s if the root has 512 kbit/s to distribute (and A and B are both 256 kbit/s of course), right? Why would I additionally give A1 and B1 a real-time curve with 128 kbit/s? What would this be good for? To give those two a higher priority? According to original paper I can give them a higher priority by using a curve, that's what HFSC is all about after all. By giving both classes a curve of [256kbit/s 20ms 128kbit/s] both have twice the priority than A2 and B2 automatically (still only getting 128 kbit/s on average) Does the real-time bandwidth count towards the link-share bandwidth? E.g. if A1 and B1 both only have 64kbit/s real-time and 64kbit/s link-share bandwidth, does that mean once they are served 64kbit/s via real-time, their link-share requirement is satisfied as well (they might get excess bandwidth, but lets ignore that for a second) or does that mean they get another 64 kbit/s via link-share? So does each class has a bandwidth "requirement" of real-time plus link-share? Or does a class only have a higher requirement than the real-time curve if the link-share curve is higher than the real-time curve (current link-share requirement equals specified link-share requirement minus real-time bandwidth already provided to this class)? Is upper limit curve applied to real-time as well, only to link-share, or maybe to both? Some tutorials say one way, some say the other way. Some even claim upper-limit is the maximum for real-time bandwidth + link-share bandwidth? What is the truth? Assuming A2 and B2 are both 128 kbit/s, does it make any difference if A1 and B1 are 128 kbit/s link-share only, or 64 kbit/s real-time and 128 kbit/s link-share, and if so, what difference? If I use the seperate real-time curve to increase priorities of classes, why would I need "curves" at all? Why is not real-time a flat value and link-share also a flat value? Why are both curves? The need for curves is clear in the original paper, because there is only one attribute of that kind per class. But now, having three attributes (real-time, link-share, and upper-limit) what for do I still need curves on each one? Why would I want the curves shape (not average bandwidth, but their slopes) to be different for real-time and link-share traffic? According to the little documentation available, real-time curve values are totally ignored for inner classes (class A and B), they are only applied to leaf classes (A1, A2, B1, B2). If that is true, why does the ALTQ HFSC sample configuration (search for 3.3 Sample configuration) set real-time curves on inner classes and claims that those set the guaranteed rate of those inner classes? Isn't that completely pointless? (note: pshare sets the link-share curve in ALTQ and grate the real-time curve; you can see this in the paragraph above the sample configuration). Some tutorials say the sum of all real-time curves may not be higher than 80% of the line speed, others say it must not be higher than 70% of the line speed. Which one is right or are they maybe both wrong? One tutorial said you shall forget all the theory. No matter how things really work (schedulers and bandwidth distribution), imagine the three curves according to the following "simplified mind model": real-time is the guaranteed bandwidth that this class will always get. link-share is the bandwidth that this class wants to become fully satisfied, but satisfaction cannot be guaranteed. In case there is excess bandwidth, the class might even get offered more bandwidth than necessary to become satisfied, but it may never use more than upper-limit says. For all this to work, the sum of all real-time bandwidths may not be above xx% of the line speed (see question above, the percentage varies). Question: Is this more or less accurate or a total misunderstanding of HSFC? And if assumption above is really accurate, where is prioritization in that model? E.g. every class might have a real-time bandwidth (guaranteed), a link-share bandwidth (not guaranteed) and an maybe an upper-limit, but still some classes have higher priority needs than other classes. In that case I must still prioritize somehow, even among real-time traffic of those classes. Would I prioritize by the slope of the curves? And if so, which curve? The real-time curve? The link-share curve? The upper-limit curve? All of them? Would I give all of them the same slope or each a different one and how to find out the right slope? I still haven't lost hope that there exists at least a hand full of people in this world that really understood HFSC and are able to answer all these questions accurately. And doing so without contradicting each other in the answers would be really nice ;-)

Read the article
Forensic Analysis of the OOM-Killer

- by Oddthinking

Ubuntu's Out-Of-Memory Killer wreaked havoc on my server, quietly assassinating my applications, sendmail, apache and others. I've managed to learn what the OOM Killer is, and about its "badness" rules. While my machine is small, my applications are even smaller, and typically only half of my physical memory is in use, let alone swap-space, so I was surprised. I am trying to work out the culprit, but I don't know how to read the OOM-Killer logs. Can anyone please point me to a tutorial on how to read the data in the logs (what are ve, free and gen?), or help me parse these logs? Apr 20 20:03:27 EL135 kernel: kill_signal(13516.0): selecting to kill, queued 0, seq 1, exc 2326 0 goal 2326 0... Apr 20 20:03:27 EL135 kernel: kill_signal(13516.0): task ebb0c6f0, thg d33a1b00, sig 1 Apr 20 20:03:27 EL135 kernel: kill_signal(13516.0): selected 1, signalled 1, queued 1, seq 1, exc 2326 0 red 61795 745 Apr 20 20:03:27 EL135 kernel: kill_signal(13516.0): selecting to kill, queued 0, seq 2, exc 122 0 goal 383 0... Apr 20 20:03:27 EL135 kernel: kill_signal(13516.0): task ebb0c6f0, thg d33a1b00, sig 1 Apr 20 20:03:27 EL135 kernel: kill_signal(13516.0): selected 1, signalled 1, queued 1, seq 2, exc 383 0 red 61795 745 Apr 20 20:03:27 EL135 kernel: kill_signal(13516.0): task ebb0c6f0, thg d33a1b00, sig 2 Apr 20 20:03:27 EL135 kernel: OOM killed process watchdog (pid=14490, ve=13516) exited, free=43104 gen=24501. Apr 20 20:03:27 EL135 kernel: OOM killed process tail (pid=4457, ve=13516) exited, free=43104 gen=24502. Apr 20 20:03:27 EL135 kernel: OOM killed process ntpd (pid=10816, ve=13516) exited, free=43104 gen=24503. Apr 20 20:03:27 EL135 kernel: OOM killed process tail (pid=27401, ve=13516) exited, free=43104 gen=24504. Apr 20 20:03:27 EL135 kernel: OOM killed process tail (pid=29009, ve=13516) exited, free=43104 gen=24505. Apr 20 20:03:27 EL135 kernel: OOM killed process apache2 (pid=10557, ve=13516) exited, free=49552 gen=24506. Apr 20 20:03:27 EL135 kernel: OOM killed process apache2 (pid=24983, ve=13516) exited, free=53117 gen=24507. Apr 20 20:03:27 EL135 kernel: OOM killed process apache2 (pid=29129, ve=13516) exited, free=68493 gen=24508. Apr 20 20:03:27 EL135 kernel: OOM killed process sendmail-mta (pid=941, ve=13516) exited, free=68803 gen=24509. Apr 20 20:03:27 EL135 kernel: OOM killed process tail (pid=12418, ve=13516) exited, free=69330 gen=24510. Apr 20 20:03:27 EL135 kernel: OOM killed process python (pid=22953, ve=13516) exited, free=72275 gen=24511. Apr 20 20:03:27 EL135 kernel: OOM killed process apache2 (pid=6624, ve=13516) exited, free=76398 gen=24512. Apr 20 20:03:27 EL135 kernel: OOM killed process python (pid=23317, ve=13516) exited, free=94285 gen=24513. Apr 20 20:03:27 EL135 kernel: OOM killed process tail (pid=29030, ve=13516) exited, free=95339 gen=24514. Apr 20 20:03:28 EL135 kernel: OOM killed process apache2 (pid=20583, ve=13516) exited, free=101663 gen=24515. Apr 20 20:03:28 EL135 kernel: OOM killed process logger (pid=12894, ve=13516) exited, free=101694 gen=24516. Apr 20 20:03:28 EL135 kernel: OOM killed process bash (pid=21119, ve=13516) exited, free=101849 gen=24517. Apr 20 20:03:28 EL135 kernel: OOM killed process atd (pid=991, ve=13516) exited, free=101880 gen=24518. Apr 20 20:03:28 EL135 kernel: OOM killed process apache2 (pid=14649, ve=13516) exited, free=102748 gen=24519. Apr 20 20:03:28 EL135 kernel: OOM killed process grep (pid=21375, ve=13516) exited, free=132167 gen=24520. Apr 20 20:03:57 EL135 kernel: kill_signal(13516.0): selecting to kill, queued 0, seq 4, exc 4215 0 goal 4826 0... Apr 20 20:03:57 EL135 kernel: kill_signal(13516.0): task ede29370, thg df98b880, sig 1 Apr 20 20:03:57 EL135 kernel: kill_signal(13516.0): selected 1, signalled 1, queued 1, seq 4, exc 4826 0 red 189481 331 Apr 20 20:03:57 EL135 kernel: kill_signal(13516.0): task ede29370, thg df98b880, sig 2 Apr 20 20:04:53 EL135 kernel: kill_signal(13516.0): selecting to kill, queued 0, seq 5, exc 3564 0 goal 3564 0... Apr 20 20:04:53 EL135 kernel: kill_signal(13516.0): task c6c90110, thg cdb1a100, sig 1 Apr 20 20:04:53 EL135 kernel: kill_signal(13516.0): selected 1, signalled 1, queued 1, seq 5, exc 3564 0 red 189481 331 Apr 20 20:04:53 EL135 kernel: kill_signal(13516.0): task c6c90110, thg cdb1a100, sig 2 Apr 20 20:07:14 EL135 kernel: kill_signal(13516.0): selecting to kill, queued 0, seq 6, exc 8071 0 goal 8071 0... Apr 20 20:07:14 EL135 kernel: kill_signal(13516.0): task d7294050, thg c03f42c0, sig 1 Apr 20 20:07:14 EL135 kernel: kill_signal(13516.0): selected 1, signalled 1, queued 1, seq 6, exc 8071 0 red 189481 331 Apr 20 20:07:14 EL135 kernel: kill_signal(13516.0): task d7294050, thg c03f42c0, sig 2 Watchdog is a watchdog task, that was idle; nothing in the logs to suggest it had done anything for days. Its job is to restart one of the applications if it dies, so a bit ironic that it is the first to get killed. Tail was monitoring a few logs files. Unlikely to be consuming memory madly. The apache web-server only serves pages to a little old lady who only uses it to get to church on Sundays a couple of developers who were in bed asleep, and hadn't visited a page on the site for a few weeks. The only traffic it might have had is from the port-scanners; all the content is password-protected and not linked from anywhere, so no spiders are interested. Python is running two separate custom applications. Nothing in the logs to suggest they weren't humming along as normal. One of them was a relatively recent implementation, which makes suspect #1. It doesn't have any data-structures of any significance, and normally uses only about 8% of the total physical RAW. It hasn't misbehaved since. The grep is suspect #2, and the one I want to be guilty, because it was a once-off command. The command (which piped the output of a grep -r to another grep) had been started at least 30 minutes earlier, and the fact it was still running is suspicious. However, I wouldn't have thought grep would ever use a significant amount of memory. It took a while for the OOM killer to get to it, which suggests it wasn't going mad, but the OOM killer stopped once it was killed, suggesting it may have been a memory-hog that finally satisfied the OOM killer's blood-lust.

Read the article
Installing Oracle 11gR2 on RHEL 6.2

- by Chris

Hello all I'm having some difficulty installing Oracle 11gR2 on RHEL 6.2 I have compiled a giant list of every single step I have taken so far I installed RHEL 6.2 on VMWARE it did it's easy install automatically I Selected 4gb of memory Selected max size of 80Gb Selected 2 processors Sorry for the bad styling copy paste isn't working correctly The version of oracle i downloaded is Linux x86-64 11.2.0.1 I am installing this on a local machine NOT a remote machine I followed the following documentation http://docs.oracle.com/cd/E11882_01/install.112/e24326/toc.htm I bolded the steps which I was least sure about from my research Easy installed with RHEL 6.2 for VMWARE Registered with red hat so I can get updates Reinstalled vmware-tools by pressing enter at every choice Sudo yum update at the end something about GPG key selected y then y Checked Memory Requirements grep MemTotal /proc/meminfo MemTotal: 3921368 kb uname -m x86_64 grep SwapTotal /proc/meminfo SwapTotal: 6160376 kb free total used free shared buffers cached Mem: 3921368 2032012 1889356 0 76216 1533268 -/+ buffers/cache: 422528 3498840 Swap: 6160376 0 6160376 df -h /dev/shm Filesystem Size Used Avail Use% Mounted on tmpfs 1.9G 276K 1.9G 1% /dev/shm df -h /tmp Filesystem Size Used Avail Use% Mounted on /dev/sda2 73G 2.7G 67G 4% / df -h Filesystem Size Used Avail Use% Mounted on /dev/sda2 73G 2.7G 67G 4% / tmpfs 1.9G 276K 1.9G 1% /dev/shm /dev/sda1 291M 58M 219M 21% /boot All looked fine to me except maybe for swap? Software Requirements cat /proc/version Linux version 2.6.32-220.el6.x86_64 ([email protected]) (gcc version 4.4.5 20110214 (Red Hat 4.4.5-6) (GCC) ) #1 SMP Wed Nov 9 08:03:13 EST 2011 uname -r 2.6.32-220.el6.x86_64 (same as above but whatever) According to the tutorial should be On Red Hat Enterprise Linux 6 2.6.32-71.el6.x86_64 or later These are the versions of software I have installed binutils-2.20.51.0.2-5.28.el6.x86_64 compat-libcap1-1.10-1.x86_64 compat-libstdc++-33-3.2.3-69.el6.x86_64 compat-libstdc++-33.i686 0:3.2.3-69.el6 gcc-4.4.6-3.el6.x86_64 gcc-c++.x86_64 0:4.4.6-3.el6 glibc-2.12-1.47.el6_2.12.x86_64 glibc-2.12-1.47.el6_2.12.i686 glibc-devel-2.12-1.47.el6_2.12.x86_64 glibc-devel.i686 0:2.12-1.47.el6_2.12 ksh.x86_64 0:20100621-12.el6_2.1 libgcc-4.4.6-3.el6.x86_64 libgcc-4.4.6-3.el6.i686 libstdc++-4.4.6-3.el6.x86_64 libstdc++.i686 0:4.4.6-3.el6 libstdc++-devel.i686 0:4.4.6-3.el6 libstdc++-devel-4.4.6-3.el6.x86_64 libaio-0.3.107-10.el6.x86_64 libaio-0.3.107-10.el6.i686 libaio-devel-0.3.107-10.el6.x86_64 libaio-devel-0.3.107-10.el6.i686 make-3.81-19.el6.x86_64 sysstat-9.0.4-18.el6.x86_64 unixODBC-2.2.14-11.el6.x86_64 unixODBC-devel-2.2.14-11.el6.x86_64 unixODBC-devel-2.2.14-11.el6.i686 unixODBC-2.2.14-11.el6.i686 8. Probably screwed up here or step 9 /usr/sbin/groupadd oinstall /usr/sbin/groupadd dba(not sure why this isn't in the tutorial) /usr/sbin/useradd -g oinstall -G dba oracle passwd oracle /sbin/sysctl -a | grep sem Xkernel.sem = 250 32000 32 128 /sbin/sysctl -a | grep shm kernel.shmmax = 68719476736 kernel.shmall = 4294967296 kernel.shmmni = 4096 vm.hugetlb_shm_group = 0 /sbin/sysctl -a | grep file-max Xfs.file-max = 384629 /sbin/sysctl -a | grep ip_local_port_range Xnet.ipv4.ip_local_port_range = 32768 61000 /sbin/sysctl -a | grep rmem_default Xnet.core.rmem_default = 124928 /sbin/sysctl -a | grep rmem_max Xnet.core.rmem_max = 131071 /sbin/sysctl -a | grep wmem_max Xnet.core.wmem_max = 131071 /sbin/sysctl -a | grep wmem_default Xnet.core.wmem_default = 124928 Here is my sysctl.conf file I only added the items that were bigger: Kernel sysctl configuration file for Red Hat Linux # For binary values, 0 is disabled, 1 is enabled. See sysctl(8) and sysctl.conf(5) for more details. Controls IP packet forwarding net.ipv4.ip_forward = 0 Controls source route verification net.ipv4.conf.default.rp_filter = 1 Do not accept source routing net.ipv4.conf.default.accept_source_route = 0 Controls the System Request debugging functionality of the kernel kernel.sysrq = 0 Controls whether core dumps will append the PID to the core filename. Useful for debugging multi-threaded applications. kernel.core_uses_pid = 1 Controls the use of TCP syncookies net.ipv4.tcp_syncookies = 1 Disable netfilter on bridges. net.bridge.bridge-nf-call-ip6tables = 0 net.bridge.bridge-nf-call-iptables = 0 net.bridge.bridge-nf-call-arptables = 0 Controls the maximum size of a message, in bytes kernel.msgmnb = 65536 Controls the default maxmimum size of a mesage queue kernel.msgmax = 65536 Controls the maximum shared segment size, in bytes kernel.shmmax = 68719476736 Controls the maximum number of shared memory segments, in pages kernel.shmall = 4294967296 fs.aio-max-nr = 1048576 fs.file-max = 6815744 kernel.sem = 250 32000 100 128 net.ipv4.ip_local_port_range = 9000 65500 net.core.rmem_default = 262144 net.core.rmem_max = 4194304 net.core.wmem_default = 262144 net.core.wmem_max = 1048576 /sbin/sysctl -p net.ipv4.ip_forward = 0 net.ipv4.conf.default.rp_filter = 1 net.ipv4.conf.default.accept_source_route = 0 kernel.sysrq = 0 kernel.core_uses_pid = 1 net.ipv4.tcp_syncookies = 1 error: "net.bridge.bridge-nf-call-ip6tables" is an unknown key error: "net.bridge.bridge-nf-call-iptables" is an unknown key error: "net.bridge.bridge-nf-call-arptables" is an unknown key kernel.msgmnb = 65536 kernel.msgmax = 65536 kernel.shmmax = 68719476736 kernel.shmall = 4294967296 fs.aio-max-nr = 1048576 fs.file-max = 6815744 kernel.sem = 250 32000 100 128 net.ipv4.ip_local_port_range = 9000 65500 net.core.rmem_default = 262144 net.core.rmem_max = 4194304 net.core.wmem_default = 262144 net.core.wmem_max = 1048576 su - oracle ulimit -Sn 1024 ulimit -Hn 1024 ulimit -Su 1024 ulimit -Hu 30482 ulimit -Su 1024 ulimit -Ss 10240 ulimit -Hs unlimited su - nano /etc/security/limits.conf *added to the end of the file * oracle soft nproc 2047 oracle hard nproc 16384 oracle soft nofile 1024 oracle hard nofile 65536 oracle soft stack 10240 exit exit su - mkdir -p /app/ chown -R oracle:oinstall /app/ chmod -R 775 /app/ 9. THIS IS PROBABLY WHERE I MESSED UP I then exited out of the root account so now I'm back in my account chris then I su - oracle echo $SHELL /bin/bash umask 0022 (so it should be set already to what is neccesary) Also from what I have read I do not need to set the DISPLAY variable because I'm installing this on the localhost I then opened the .bash_profile of the oracle and changed it to the following .bash_profile Get the aliases and functions if [ -f ~/.bashrc ]; then . ~/.bashrc fi User specific environment and startup programs PATH=$PATH:$HOME/bin; export PATH ORACLE_BASE=/app/oracle ORACLE_SID=orcl export ORACLE_BASE ORACLE_SID I then shutdown the virtual machine shared my desktop folder from my windows 7 then turned back on the virtual machine logged in as chris opened up a terminal then: su - for some reason the shared folder didn't appear so I reinstalled vmware tools again and restarted then same as before su - cp -R linux_oracle/database /db; chown -R oracle:oinstall /db; chmod -R 775 /db; ll /db drwxrwxr-x. 8 oracle oinstall 4096 Jun 5 06:20 database exit su - oracle cd /db/database ./runInstaller AND FINALLY THE INFAMOUS JAVA:132 ERROR MESSAGE Starting Oracle Universal Installer... Checking Temp space: must be greater than 80 MB. Actual 65646 MB Passed Checking swap space: must be greater than 150 MB. Actual 6015 MB Passed Checking monitor: must be configured to display at least 256 colors. Actual 16777216 Passed Preparing to launch Oracle Universal Installer from /tmp/OraInstall2012-06-05_06-47-12AM. Please wait ...[oracle@localhost database]$ Exception in thread "main" java.lang.UnsatisfiedLinkError: /tmp/OraInstall2012-06-05_06-47-12AM/jdk/jre/lib/i386/xawt/libmawt.so: libXext.so.6: cannot open shared object file: No such file or directory at java.lang.ClassLoader$NativeLibrary.load(Native Method) at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1751) at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1647) at java.lang.Runtime.load0(Runtime.java:769) at java.lang.System.load(System.java:968) at java.lang.ClassLoader$NativeLibrary.load(Native Method) at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1751) at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1668) at java.lang.Runtime.loadLibrary0(Runtime.java:822) at java.lang.System.loadLibrary(System.java:993) at sun.security.action.LoadLibraryAction.run(LoadLibraryAction.java:50) at java.security.AccessController.doPrivileged(Native Method) at java.awt.Toolkit.loadLibraries(Toolkit.java:1509) at java.awt.Toolkit.(Toolkit.java:1530) at com.jgoodies.looks.LookUtils.isLowResolution(Unknown Source) at com.jgoodies.looks.LookUtils.(Unknown Source) at com.jgoodies.looks.plastic.PlasticLookAndFeel.(PlasticLookAndFeel.java:122) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:242) at javax.swing.SwingUtilities.loadSystemClass(SwingUtilities.java:1783) at javax.swing.UIManager.setLookAndFeel(UIManager.java:480) at oracle.install.commons.util.Application.startup(Application.java:758) at oracle.install.commons.flow.FlowApplication.startup(FlowApplication.java:164) at oracle.install.commons.flow.FlowApplication.startup(FlowApplication.java:181) at oracle.install.commons.base.driver.common.Installer.startup(Installer.java:265) at oracle.install.ivw.db.driver.DBInstaller.startup(DBInstaller.java:114) at oracle.install.ivw.db.driver.DBInstaller.main(DBInstaller.java:132)

Read the article
ESX3.5 Cluster & MD3000i -- Both servers see iSCSI Targets, Only one server can use partition.

- by GruffTech

Alright. First and foremost, Warning. This is a bigger-then-normal question. I like to be thorough and try to eliminate all possible "easymode" answers, as well as give everyone a feel of what i've tried. I've included several images of our setup and the problem it is having.. TLDR Version: So I've followed the guides located here: ESX Deployment Guide V1 this is the guide Dell has sent me to setup two ESX3.5 servers mounting a Dell MD3000i. It doesn't work. Both servers can't use the same storage partition on the MD3000. Both servers see it, but only one server can actually use it. (that server being whatever server created the partition on the target.) Both ESX servers are members of the Host Group. Full Version I have 2 ESX3.5 Servers (10.0.7.102, also called EPI2, and 10.0.7.103, also called EPI3.) connected to a iSCSI SAN Device (Dell MD3000i). Both ESX servers can "scan" the SAN and see the LUNS. Part One: MD3000i Storage On the MD3000i, Both servers are in my host group. I have two partitions, VM1 and VM2, both 1.6TB (vmware doesn't like anything past 2tb.) And you can even see that the ESX servers are targetting the MD3000 just fine. Part Two: The ESX Servers Figure 1. So as you can see above, Both ESX Servers (10.0.7.102 and 10.0.7.103) are able to see and scan the MD3000i SAN. Figure 2. Above is the storage both servers see. I created the storage partition on EPI2 (102). I then Extended the partition to include the second LUN for a grand total of 3.27 TB of storage. When i "rescan" on 103 (the server not mounting the partition), I get the below log in log/messages. Mar 11 10:41:18 epi3 kernel: scsi1: remove-single-device 0 0 0 failed, device busy(4). being the only line that grabs my attentions. (EPI3 is the server name) Mar 11 10:41:04 epi3 vmkiscsid[5436]: Connected to Discovery Address 192.168.130.101 Mar 11 10:41:04 epi3 vmkiscsid[5437]: Connected to Discovery Address 192.168.130.102 Mar 11 10:41:04 epi3 vmkiscsid[5438]: Connected to Discovery Address 192.168.131.101 Mar 11 10:41:04 epi3 vmkiscsid[5439]: Connected to Discovery Address 192.168.131.102 Mar 11 10:41:17 epi3 kernel: scsi singledevice 2 0 0 0 Mar 11 10:41:17 epi3 kernel: Vendor: DELL Model: MD3000i Rev: 0735 Mar 11 10:41:17 epi3 kernel: Type: Direct-Access ANSI SCSI revision: 05 Mar 11 10:41:17 epi3 kernel: VMWARE SCSI Id: Supported VPD pages for sdb : 0x0 0x80 0x83 0x85 0x86 0x87 0xc0 0xc1 0xc2 0xc3 0xc4 0xc8 0xc9 0xca 0xd0 Mar 11 10:41:17 epi3 kernel: VMWARE SCSI Id: Device id info for sdb: 0x1 0x3 0x0 0x10 0x60 0x1 0xe4 0xf0 0x0 0x1a 0x1a 0xa2 0x0 0x0 0x15 0xe2 0x4d 0x75 0xf6 0x99 0x53 0x98 0x0 0x54 0x69 0x71 0x6e 0x2e 0x31 0x39 0x38 0x34 0x2d 0x30 0x35 0x2e 0x63 0x6f 0x6d 0x2e 0x64 0x65 0x6c 0x6c 0x3a 0x70 0x6f 0x77 0x65 0x72 0x76 0x61 0x75 0x6c 0x74 0x2e 0x36 0x30 0x30 0x31 0x65 0x34 0x66 0x30 0x30 0x30 0x31 0x61 0x31 0x61 0x61 0x32 0x30 0x30 0x30 0x30 0x30 0x30 0x30 0x30 0x34 0x37 0x39 0x30 0x36 0x32 0x32 0x65 0x2c 0x74 0x2c 0x30 0x78 0x30 0x30 0x30 0x31 0x30 0x30 0x30 0x30 0x30 0x30 0x30 0x32 0x0 0x0 0x0 0x51 0x94 0x0 0x4 0x0 0x0 0x80 0x1 0x53 0xa8 0x0 0x44 0x69 0x71 0x6e 0x2e 0x31 0x39 0x38 0x34 0x2d 0x30 0x35 0x2e 0x63 0x6f 0x6d 0x2e 0x64 0x65 0x6c 0x6c 0x3a 0x70 0x6f 0x77 0x65 0x72 0x76 0x61 0x75 0x6c 0x74 0x2e 0x36 0x30 0x30 0x31 0x65 0x34 0x66 0x30 0x30 0x30 0x31 0x61 0x31 0x61 0x61 0x32 0x30 0x30 0x30 0x30 0x30 0x30 0x30 0x30 0x34 0x37 0x39 0x30 0x36 0x32 0x32 0x65 0x0 0x0 0x0 0x0 Mar 11 10:41:17 epi3 kernel: VMWARE SCSI Id: Id for sdb 0x60 0x01 0xe4 0xf0 0x00 0x1a 0x1a 0xa2 0x00 0x00 0x15 0xe2 0x4d 0x75 0xf6 0x99 0x4d 0x44 0x33 0x30 0x30 0x30 Mar 11 10:41:17 epi3 kernel: VMWARE: Unique Device attached as scsi disk sdb at scsi2, channel 0, id 0, lun 0 Mar 11 10:41:17 epi3 kernel: Attached scsi disk sdb at scsi2, channel 0, id 0, lun 0 Mar 11 10:41:17 epi3 kernel: scan_scsis starting finish Mar 11 10:41:17 epi3 kernel: SCSI device sdb: 3509329920 512-byte hdwr sectors (1797751 MB) Mar 11 10:41:17 epi3 kernel: sdb: sdb1 Mar 11 10:41:17 epi3 kernel: scan_scsis done with finish Mar 11 10:41:17 epi3 kernel: scsi singledevice 2 0 0 1 Mar 11 10:41:17 epi3 kernel: Vendor: DELL Model: MD3000i Rev: 0735 Mar 11 10:41:17 epi3 kernel: Type: Direct-Access ANSI SCSI revision: 05 Mar 11 10:41:18 epi3 kernel: VMWARE SCSI Id: Supported VPD pages for sdc : 0x0 0x80 0x83 0x85 0x86 0x87 0xc0 0xc1 0xc2 0xc3 0xc4 0xc8 0xc9 0xca 0xd0 Mar 11 10:41:18 epi3 kernel: VMWARE SCSI Id: Device id info for sdc: 0x1 0x3 0x0 0x10 0x60 0x1 0xe4 0xf0 0x0 0x1a 0x1a 0x86 0x0 0x0 0xd 0xb7 0x4d 0x75 0xf2 0x77 0x53 0x98 0x0 0x54 0x69 0x71 0x6e 0x2e 0x31 0x39 0x38 0x34 0x2d 0x30 0x35 0x2e 0x63 0x6f 0x6d 0x2e 0x64 0x65 0x6c 0x6c 0x3a 0x70 0x6f 0x77 0x65 0x72 0x76 0x61 0x75 0x6c 0x74 0x2e 0x36 0x30 0x30 0x31 0x65 0x34 0x66 0x30 0x30 0x30 0x31 0x61 0x31 0x61 0x61 0x32 0x30 0x30 0x30 0x30 0x30 0x30 0x30 0x30 0x34 0x37 0x39 0x30 0x36 0x32 0x32 0x65 0x2c 0x74 0x2c 0x30 0x78 0x30 0x30 0x30 0x31 0x30 0x30 0x30 0x30 0x30 0x30 0x30 0x32 0x0 0x0 0x0 0x51 0x94 0x0 0x4 0x0 0x0 0x80 0x1 0x53 0xa8 0x0 0x44 0x69 0x71 0x6e 0x2e 0x31 0x39 0x38 0x34 0x2d 0x30 0x35 0x2e 0x63 0x6f 0x6d 0x2e 0x64 0x65 0x6c 0x6c 0x3a 0x70 0x6f 0x77 0x65 0x72 0x76 0x61 0x75 0x6c 0x74 0x2e 0x36 0x30 0x30 0x31 0x65 0x34 0x66 0x30 0x30 0x30 0x31 0x61 0x31 0x61 0x61 0x32 0x30 0x30 0x30 0x30 0x30 0x30 0x30 0x30 0x34 0x37 0x39 0x30 0x36 0x32 0x32 0x65 0x0 0x0 0x0 0x0 Mar 11 10:41:18 epi3 kernel: VMWARE SCSI Id: Id for sdc 0x60 0x01 0xe4 0xf0 0x00 0x1a 0x1a 0x86 0x00 0x00 0x0d 0xb7 0x4d 0x75 0xf2 0x77 0x4d 0x44 0x33 0x30 0x30 0x30 Mar 11 10:41:18 epi3 kernel: VMWARE: Unique Device attached as scsi disk sdc at scsi2, channel 0, id 0, lun 1 Mar 11 10:41:18 epi3 kernel: Attached scsi disk sdc at scsi2, channel 0, id 0, lun 1 Mar 11 10:41:18 epi3 kernel: scan_scsis starting finish Mar 11 10:41:18 epi3 kernel: SCSI device sdc: 3509329920 512-byte hdwr sectors (1797751 MB) Mar 11 10:41:18 epi3 kernel: sdc: sdc1 Mar 11 10:41:18 epi3 kernel: scan_scsis done with finish Mar 11 10:41:18 epi3 kernel: scsi1: remove-single-device 0 0 0 failed, device busy(4). Mar 11 10:41:18 epi3 kernel: scsi singledevice 1 0 0 0 Things I've Tried: Removing iSCSI targets from only 103, disabling iSCSI, rebooting, enabled iSCSI, re-adding targets, rescan. Same result. Removing partition on 102, Formatted partition on 103 instead. Same result, except flipped. 103 can use storage, 102 can not. Starting Over. Removing all iSCSI Targets on both ESX Boxes, disabling iSCSI, turning off the firewall for iSCSI, rebooting ESX. Then on the MD3000, Removed the Host Group, Removed the Host-to-Virtual Mappings, Restarted the SAN. Followed the Documentation again, same result. Both servers see the storage, but only one server can use it. Disabling and Re-enabling VMware DRS and HA. Same result. Flat-out turning off VMware DRS and HA, and doing the "start over" step to see if maybe that borked it. Same Result. I'm kinda loosing my mind here, Everything i read online says "just partition it and if the ESX boxes can see the targets, it just works".... well crap. Any ideas, any other things to try? Can anyone atleast point me in the right direction? I'm really tired of working from 1am til 4am (our maintenance hours)

Read the article
Does anyone really understand how HFSC scheduling in Linux/BSD works?

- by Mecki

I read the original SIGCOMM '97 PostScript paper about HFSC, it is very technically, but I understand the basic concept. Instead of giving a linear service curve (as with pretty much every other scheduling algorithm), you can specify a convex or concave service curve and thus it is possible to decouple bandwidth and delay. However, even though this paper mentions to kind of scheduling algorithms being used (real-time and link-share), it always only mentions ONE curve per scheduling class (the decoupling is done by specifying this curve, only one curve is needed for that). Now HFSC has been implemented for BSD (OpenBSD, FreeBSD, etc.) using the ALTQ scheduling framework and it has been implemented Linux using the TC scheduling framework (part of iproute2). Both implementations added two additional service curves, that were NOT in the original paper! A real-time service curve and an upper-limit service curve. Again, please note that the original paper mentions two scheduling algorithms (real-time and link-share), but in that paper both work with one single service curve. There never have been two independent service curves for either one as you currently find in BSD and Linux. Even worse, some version of ALTQ seems to add an additional queue priority to HSFC (there is no such thing as priority in the original paper either). I found several BSD HowTo's mentioning this priority setting (even though the man page of the latest ALTQ release knows no such parameter for HSFC, so officially it does not even exist). This all makes the HFSC scheduling even more complex than the algorithm described in the original paper and there are tons of tutorials on the Internet that often contradict each other, one claiming the opposite of the other one. This is probably the main reason why nobody really seems to understand how HFSC scheduling really works. Before I can ask my questions, we need a sample setup of some kind. I'll use a very simple one as seen in the image below: Here are some questions I cannot answer because the tutorials contradict each other: What for do I need a real-time curve at all? Assuming A1, A2, B1, B2 are all 128 kbit/s link-share (no real-time curve for either one), then each of those will get 128 kbit/s if the root has 512 kbit/s to distribute (and A and B are both 256 kbit/s of course), right? Why would I additionally give A1 and B1 a real-time curve with 128 kbit/s? What would this be good for? To give those two a higher priority? According to original paper I can give them a higher priority by using a curve, that's what HFSC is all about after all. By giving both classes a curve of [256kbit/s 20ms 128kbit/s] both have twice the priority than A2 and B2 automatically (still only getting 128 kbit/s on average) Does the real-time bandwidth count towards the link-share bandwidth? E.g. if A1 and B1 both only have 64kbit/s real-time and 64kbit/s link-share bandwidth, does that mean once they are served 64kbit/s via real-time, their link-share requirement is satisfied as well (they might get excess bandwidth, but lets ignore that for a second) or does that mean they get another 64 kbit/s via link-share? So does each class has a bandwidth "requirement" of real-time plus link-share? Or does a class only have a higher requirement than the real-time curve if the link-share curve is higher than the real-time curve (current link-share requirement equals specified link-share requirement minus real-time bandwidth already provided to this class)? Is upper limit curve applied to real-time as well, only to link-share, or maybe to both? Some tutorials say one way, some say the other way. Some even claim upper-limit is the maximum for real-time bandwidth + link-share bandwidth? What is the truth? Assuming A2 and B2 are both 128 kbit/s, does it make any difference if A1 and B1 are 128 kbit/s link-share only, or 64 kbit/s real-time and 128 kbit/s link-share, and if so, what difference? If I use the seperate real-time curve to increase priorities of classes, why would I need "curves" at all? Why is not real-time a flat value and link-share also a flat value? Why are both curves? The need for curves is clear in the original paper, because there is only one attribute of that kind per class. But now, having three attributes (real-time, link-share, and upper-limit) what for do I still need curves on each one? Why would I want the curves shape (not average bandwidth, but their slopes) to be different for real-time and link-share traffic? According to the little documentation available, real-time curve values are totally ignored for inner classes (class A and B), they are only applied to leaf classes (A1, A2, B1, B2). If that is true, why does the ALTQ HFSC sample configuration (search for 3.3 Sample configuration) set real-time curves on inner classes and claims that those set the guaranteed rate of those inner classes? Isn't that completely pointless? (note: pshare sets the link-share curve in ALTQ and grate the real-time curve; you can see this in the paragraph above the sample configuration). Some tutorials say the sum of all real-time curves may not be higher than 80% of the line speed, others say it must not be higher than 70% of the line speed. Which one is right or are they maybe both wrong? One tutorial said you shall forget all the theory. No matter how things really work (schedulers and bandwidth distribution), imagine the three curves according to the following "simplified mind model": real-time is the guaranteed bandwidth that this class will always get. link-share is the bandwidth that this class wants to become fully satisfied, but satisfaction cannot be guaranteed. In case there is excess bandwidth, the class might even get offered more bandwidth than necessary to become satisfied, but it may never use more than upper-limit says. For all this to work, the sum of all real-time bandwidths may not be above xx% of the line speed (see question above, the percentage varies). Question: Is this more or less accurate or a total misunderstanding of HSFC? And if assumption above is really accurate, where is prioritization in that model? E.g. every class might have a real-time bandwidth (guaranteed), a link-share bandwidth (not guaranteed) and an maybe an upper-limit, but still some classes have higher priority needs than other classes. In that case I must still prioritize somehow, even among real-time traffic of those classes. Would I prioritize by the slope of the curves? And if so, which curve? The real-time curve? The link-share curve? The upper-limit curve? All of them? Would I give all of them the same slope or each a different one and how to find out the right slope? I still haven't lost hope that there exists at least a hand full of people in this world that really understood HFSC and are able to answer all these questions accurately. And doing so without contradicting each other in the answers would be really nice ;-)

Read the article
Why did my flash drive become "read only" and (how) can I fix it?

- by Bob

I have a brand new flash drive (one week old) that has become marked as read only, by Windows, Kubuntu and a bootable partitioner. Why did this happen? Is it fixable? If it is, how can I fix this? The problem Firstly, this drive is new. It's certainly not been used enough to die from normal wear and tear, though I would not discount defective components. The drive itself has somehow become locked in a read only state. Windows' Disk management: Diskpart: Generic Flash Disk USB Device Disk ID: 33FA33FA Type : USB Status : Online Path : 0 Target : 0 LUN ID : 0 Location Path : UNAVAILABLE Current Read-only State : Yes Read-only : No Boot Disk : No Pagefile Disk : No Hibernation File Disk : No Crashdump Disk : No Clustered Disk : No What really confuses me is Current Read-only State : Yes and Read-only : No. Attempted solutions So far, I've tried: Formatting it in Windows (in Disk management, the format options are greyed out when right clicking). DiskPart Clean (CLEAN - Clear the configuration information, or all information, off the disk.): DISKPART> clean DiskPart has encountered an error: The media is write protected. See the System Event Log for more information. There was nothing in the event log. Windows command line format >format G: Insert new disk for drive G: and press ENTER when ready... The type of the file system is FAT32. Verifying 7740M Cannot format. This volume is write protected. Windows chkdsk: see below for details Kubuntu fsck (through VirtualBox USB passthrough): see below for details Acronis True Image to format, to convert to GPT, to destroy and rebuild MBR, basically anything: failed (could not write to MBR) Details (and a nice story) Background This was a brand new, generic, 8GB flash drive I wanted to create a multiboot flash drive with. It came formatted as FAT32, though oddly a little larger than most 8 GIGAbyte flash drives I've come across. Approximately 127MB was listed as "used" by Windows. I never discovered why. The end usable space was about what I normally expect from a 8GB drive (approx 7.4 GIBIbytes). I had thrown quite a few Linux distros on, along with a copy of Hiren's. They would all boot perfectly. They were put on with YUMI. When I tried to put the Knoppix DVD on, YUMI added an odd video option to its boot comman which caused Knoppix to boot with a black screen on X. ttys 1 through 6 still worked as text only interfaces. A few days later, I took some time to take that odd video option off, making the boot command match the one that comes with Knoppix. On the attempt to boot, Knoppix reported some form of LZMA corruption. Leading up to the current issue I was thinking the Knoppix files may have been corrupted somehow, so I tried reloading it. The drive was nearly full (45MB free), so I deleted a generic ISO that also was not booting. That went fine. I then went through YUMI to 'uninstall' Knoppix, i.e. delete files and remove from the menus. The files went first, then the menus were cleared successfully. However, the free space was stuck at about 700MB, same as it was before removing Knoppix. In the old Knoppix folder, there was a 0 byte file named KNOPPIX that could not be deleted. I tried reinserting the drive to delete this file - without safely removing, if that made a difference (hey, first time for everything). Running the standard Windows chkdsk scan without /r or /f reported errors found. Running with /r just got it stuck. I decided to give fsck a shot, so I loaded up my Kubuntu VM and attached the drive to it with VirtualBox's USB 2.0 passthrough. I umounted it (/dev/sda1) and ran a fsck. There are differences between boot sector and its backup. I chose No action. It told me FATs differ and asked me to select either the first or second FAT. Whichever I selected, I got a notice of Free cluster summary wrong. If I chose Correct, it gave a list of incorrect file names. To try to fix something, at least, I ran it with the -p option. Halfway through fixing the files, the VM froze - I ended its process about ten minutes later. Cause? My next attempt was to use YUMI, again, to rebuild the whole drive. I used YUMI's built in reformat (to FAT32) option and installed a Kubuntu ISO (700MB). The format was successful, however, the extract and copy of Kubuntu (which YUMI uses a 7zip binary for) froze at about 60% done. After waiting for about fifteen minutes (longer than the 3.5GB Knoppix ISO took last time), I pulled the drive out. The drive at this point was already formatted, SYSLINUX already installed, just waiting on the unpacking of an ISO and the modifying of the boot menus. Plugging it back in, it came up as normal - however, any write action would fail. Disk management reported it as read only. On reconnect, it would come up as normal but a write operation would cause it to go read only again. After a few attempts, it started coming up as read only on insertion. Attempts to fix This is when I ran through the attempts listed above, to try and reformat it in case of a faulty format. However the inability to do so even on a bootable disk indicated something more serious is wrong. chkdsk now reports nothing is wrong, and fsck still reports MBR inconsistencies, but now always chooses first FAT automatically after telling me FATs differ. It still does the same Free cluster summary wrong afterwards. I cannot run with -p anymore because it is now marked as read only. It also managed to corrupt my VM's disk somehow on the first attempt (yes, I'm sure I chose sda, which is mapped to a 7.4GB drive - I triple checked). Thank god for snapshots? I'm just about out of ideas. To my inexperienced mind it looks like something in the drive's firmware set it to read only "permanently" somehow - is there any way to reset this? I don't particularly care about keeping data, considering I've reformatted it twice. Also, fixes that keep me in Windows are better; it reduces the risk of me accidentally nuking my main hard drive. Update 1: I pulled apart the drive out of curiosity. As you can see, there are no obvious write protect switches. There is an IC on the other side, ALCOR branded labelled AU6989HL, if that matters. If there appears to be no way to fix this, I'll probably pull out the (glued down) card and put it in a card reader to check if it's the card or the controller that died. Update 2: I've pulled the card off, Windows detects the drive as a card reader now. The contacts on the card don't appear to be used, and there are several rows of holes on the card itself. Putting it into the card reader only detects about 30MB total, RAW. It's probably either the reader incorrectly reporting the card as faulty (as if a real SD card's write protect was switched on) or a bad contact somewhere. If nothing else, I have a spare 8GB Micro SD card now... as soon as I figure out how to format it as 8GB.

Read the article
ftp connection problem, vsftp server, active mode

- by Mark Szente

I have a server that runs vsftpd to handle ftp connections. One of my users have a notebook with Total Commander and WinSCP installed. Both ftp clients fail right after the connection is established to the server and it tries to download the directory listing without any particular error message. The weird thing is: the notebook works perfectly ok with other ftp servers. My ftp server also works well with other clients. In fact, this user also has a pc running on the same LAN as the notebook and the pc works well with the ftp server. We use active ftp connection mode. Passive mode works well but is not an option at this point. I would post more technical details but I don't even know what this problem is related to. Anyway, below is the server side tcpdump for the failed connection attempt. There's no further communication between the client and the server after the last line of log. Thank you very much for any hint! 23:39:24.514852 IP 62.201.xx.xx.2241 > 195.70.xx.xx.21: S 1314489715:1314489715(0) win 65535 <mss 1460,nop,wscale 3,nop,nop,sackOK> 23:39:24.514896 IP 195.70.xx.xx.21 > 62.201.xx.xx.2241: S 2633658883:2633658883(0) ack 1314489716 win 5840 <mss 1460,nop,nop,sackOK,nop,wscale 2> 23:39:24.520842 IP 62.201.xx.xx.2241 > 195.70.xx.xx.21: . ack 1 win 62500 23:39:24.523803 IP 195.70.xx.xx.21 > 62.201.xx.xx.2241: P 1:21(20) ack 1 win 1460 23:39:24.546858 IP 62.201.xx.xx.2241 > 195.70.xx.xx.21: P 1:15(14) ack 21 win 62497 23:39:24.546902 IP 195.70.xx.xx.21 > 62.201.xx.xx.2241: . ack 15 win 1460 23:39:24.547247 IP 195.70.xx.xx.21 > 62.201.xx.xx.2241: P 21:55(34) ack 15 win 1460 23:39:24.762806 IP 62.201.xx.xx.2241 > 195.70.xx.xx.21: . ack 55 win 62493 23:39:30.415011 IP 62.201.xx.xx.2241 > 195.70.xx.xx.21: P 15:28(13) ack 55 win 62493 23:39:30.454116 IP 195.70.xx.xx.21 > 62.201.xx.xx.2241: . ack 28 win 1460 23:39:31.036283 IP 195.70.xx.xx.21 > 62.201.xx.xx.2241: P 55:78(23) ack 28 win 1460 23:39:31.053018 IP 62.201.xx.xx.2241 > 195.70.xx.xx.21: P 28:34(6) ack 78 win 62490 23:39:31.053042 IP 195.70.xx.xx.21 > 62.201.xx.xx.2241: . ack 34 win 1460 23:39:31.053268 IP 195.70.xx.xx.21 > 62.201.xx.xx.2241: P 78:97(19) ack 34 win 1460 23:39:31.068969 IP 62.201.xx.xx.2241 > 195.70.xx.xx.21: P 34:40(6) ack 97 win 62488 23:39:31.069148 IP 195.70.xx.xx.21 > 62.201.xx.xx.2241: P 97:112(15) ack 40 win 1460 23:39:31.069179 IP 195.70.xx.xx.21 > 62.201.xx.xx.2241: P 112:119(7) ack 40 win 1460 23:39:31.076981 IP 62.201.xx.xx.2241 > 195.70.xx.xx.21: . ack 119 win 62485 23:39:31.077010 IP 195.70.xx.xx.21 > 62.201.xx.xx.2241: P 119:177(58) ack 40 win 1460 23:39:31.114979 IP 62.201.xx.xx.2241 > 195.70.xx.xx.21: P 40:45(5) ack 177 win 62478 23:39:31.115164 IP 195.70.xx.xx.21 > 62.201.xx.xx.2241: P 177:186(9) ack 45 win 1460 23:39:31.180966 IP 62.201.xx.xx.2241 > 195.70.xx.xx.21: P 45:53(8) ack 186 win 62476 23:39:31.181066 IP 195.70.xx.xx.21 > 62.201.xx.xx.2241: P 186:216(30) ack 53 win 1460 23:39:31.213065 IP 62.201.xx.xx.2241 > 195.70.xx.xx.21: P 53:80(27) ack 216 win 62473 23:39:31.213180 IP 195.70.xx.xx.21 > 62.201.xx.xx.2241: P 216:267(51) ack 80 win 1460 23:39:31.251086 IP 62.201.xx.xx.2241 > 195.70.xx.xx.21: P 80:86(6) ack 267 win 62466 23:39:31.251498 IP 195.70.xx.xx.20 > 62.201.xx.xx.5001: S 2640780713:2640780713(0) win 5840 <mss 1460,sackOK,timestamp 2054371220 0,nop,wscale 2> 23:39:31.290979 IP 195.70.xx.xx.21 > 62.201.xx.xx.2241: . ack 86 win 1460 23:39:34.251489 IP 195.70.xx.xx.20 > 62.201.xx.xx.5001: S 2640780713:2640780713(0) win 5840 <mss 1460,sackOK,timestamp 2054374220 0,nop,wscale 2> 23:39:40.249625 IP 195.70.xx.xx.20 > 62.201.xx.xx.5001: S 2640780713:2640780713(0) win 5840 <mss 1460,sackOK,timestamp 2054380220 0,nop,wscale 2> 23:39:43.695108 IP 195.70.xx.xx.21 > 62.201.xx.xx.1057: P 2280716551:2280716588(37) ack 3838413728 win 5840 23:39:52.248791 IP 195.70.xx.xx.20 > 62.201.xx.xx.5001: S 2640780713:2640780713(0) win 5840 <mss 1460,sackOK,timestamp 2054392220 0,nop,wscale 2> 23:40:16.245159 IP 195.70.xx.xx.20 > 62.201.xx.xx.5001: S 2640780713:2640780713(0) win 5840 <mss 1460,sackOK,timestamp 2054416221 0,nop,wscale 2> 23:40:29.853685 IP 195.70.xx.xx.21 > 62.201.xx.xx.1057: FP 37:51(14) ack 1 win 5840 23:40:31.241951 IP 195.70.xx.xx.21 > 62.201.xx.xx.2241: P 267:304(37) ack 86 win 1460 23:40:31.381708 IP 62.201.xx.xx.2241 > 195.70.xx.xx.21: . ack 304 win 62462

Read the article
Slow draw on some apps and dynamic clocks not working properly with ATI/AMD proprietary drivers

- by Rakeka

I've recently purchased a new computer (around July 2010) and I've been having some problems with proprietary video drivers on Linux. The hardware is: Video: ATI/AMD Radeon HD 5870 (XFX HD-587X-ZNFC); Motherboard: Asus P7P55D-E Deluxe; Processor: Intel i5 750; Memory: Kingston Hyperx KHX1600C8D3K2/4GX (2x - 8GB Total); Power Supply: XFX P1-750B-CAG9; There are no overclocks, not even the memories (they are at 1333mhz due processor memory controller limitation). The operational system is a homebrew Linux distribution with the following software: Architecture: x86_64 (multilib) Kernel: 2.6.35.10 Xorg: 7.5 Window Manager: wmii-3.9.2 Video Driver: ATI/AMD Catalyst 10.12 There are no desktop effects programs like compiz fusion or beryl. The problems: With ATI/AMD proprietary driver, some applications are with slow draw/redraw, and, the same applications make the driver to increase the card clocks to maximum (0% gpu activity, only the clocks are increased). I dunno exactly how to describe the slow draw but I'll list some applications and symptoms. xterm Flickers a lot when drawing continuous output; When I'm in a workspace with fullscreen xterm, The gpu load stays at 12% in idle, and, with smaller xterm, smaller GPU load. "aticonfig --odgc" output: Default Adapter - ATI Radeon HD 5800 Series Core (MHz) Memory (MHz) Current Clocks : 157 300 Current Peak : 850 1200 Configurable Peak Range : [600-900] [900-1300] GPU load : 12% "aticonfig --pplib-cmd 'get activity'" output: Current Activity is Core Clock: 157MHZ Memory Clock: 300MHZ VDDC: 950 Activity: 12 percent Performance Level: 0 Bus Speed: 5000 Bus Lanes: 16 Maximum Bus Lanes: 16 More examples: mplayer time info flickers on terminal; "find /" flickers a lot (It takes some time to stop with control-c. But, If I change the workspace or put some window upon it, just after the control-c, it stops instantly); "cat somefile" if the file is big (Xorg.0.log for example) it takes some time to display; vim and less (ex: find / | less) don't have much problems, just a little flicker when scrolling; mplayer (no gui) Slow reproduction and seek with -vo x11; Tearing with -vo xv; Time info flickers on terminal (xterm consequence); gvim A little slow draw when scrolling with page up/page down; Firefox Slow draw/redraw on some pages like www.boadica.com.br and sometimes on www.youtube.com with flash enable (never noticed on many pages); Corruptions when informative yellow boxes are showing and scroll the page (an gray box appears at the same place of the informative box); "Wallpaper" After minimizing a fullscreen window or changing to an empty workspace it takes some time to redraw wallpaper. "Video Card" The core and memory clocks are increased with the events described above and on other situations like change workspace (even without wallpaper), minimize, maximize or move a window; Idle clocks: Core: 157mhz, Memory: 300mhz Full clocks: Core: 850mhz, Memory: 1200mhz xpdf Painful slow scrolling; display (from ImageMagick) Slow menus and sometimes slow image redraw; Programs that I use and are apparently without problems: gimp; pidgin; mplayer (-vo gl, gl2); blender; unigine heaven (better fps than on Windows); doom3; tibia; penumbra overture; amnesia the dark descent (wine); diablo 2 (wine); No problems on Windows (Windows 7 Ultimate 64bit). And special note to this: Full desktop effects from Debian and Ubuntu gnome appearance cpanel don't cause ANY problems, even the core and memory clocks don't increase when change workspace, minimize, maximize or move a window. What I've tested: Unsuccessful tests: Tested all drivers versions since 10.6 (released approximately when I've installed the first slackware in this PC); Tested other video card - ATI/AMD Radeon HD 5570 (XFX HD-557X-ZHF2); Tested some options on xorg.conf and that I've found googling (some of these options are commented on my xorg.conf. I'll send the links at the end of post); Tested some patches like 107_fedora_dont_fill_bg_none.patch and xserver-xorg-backclear.patch from Arch Linux Catalyst page (https://wiki.archlinux.org/index.php/ATI_Catalyst); Tested other distros and software versions: Tested XORG-7.6 on my own distribution; Tested Debian Squeeze (testing - from 2010-12-20); Tested Ubuntu Marverick (10.10); Tested Slackware 13.1; Distros info: Architecture: i386 Debian and Ubuntu with all default software (kernel, gnome, xorg, drivers); Slackware with Catalyst from AMD page and default window managers like: fvwm, xfce, and my own build of wmii; Successful tests: Tested other video card (only on my homebrew distro) - NVIDIA Geforce 7300GS with driver 260.19.29; That didn't shown the slow draw problems, but that card is a bit obsolete, so, dunno if that lacks features like the dynamic clocks. I don't dispose of other video cards like nvidia g/gt/gts/gtx 200~400~500 or Radeon HD 3000/4000/6000 to make more tests. Tested other hardware: Video: ATI/AMD Radeon HD 5570 (XFX HD-557X-ZHF2); Motherboard: Intel DG31PR; Processor: Core 2 Duo E6750; Software for that hardware: Fresh install of same distros (except for the mine) with same program versions; That video card (HD 5570) were full time at the maximum clocks (something like 500/750, don't remember) in all the operational systems (Windows XP and Windows 7 too), but it didn't shown the same problems that I have here. I've googled a lot about common problems with ATI/AMD proprietary drivers for Linux and didn't find similar problems, except by the Firefox corruptions, that the solutions were to disable ATI Direct2DAccel and use XAA. With XAA the problems persists and the other applications like pidgin and rest of Firefox showed the same problems of slow draw/redraw. Open source Drivers: With open source drivers (xf86-video-ati-6.13.2) I hadn't the same slow draw problems, but, had other problems, that, for now, make it no viable solution. I'll not discuss it here because this is another line of problems and will confuse everything. If it happens to be the only solution, I'll make another thread to discuss it. Logs and Configs: kernel .config dmesg xorg package list xorg.conf Xorg.0.log

Read the article
Optimize php-fpm and varnish for a powerfull server

- by Jim

My setup is: Intel® Core™ i7-2600 and RAM 16 GB DDR3 RAM varnish+nginx+php-fpm+apc for a not very heavy WordPress blog with W3 Total Cache and CDN My problem is that after 55 hits per second according to blitz.io varnish starts giving out timeouts. CPU usage at this time is hardly 1%. Free memory at all time remains 10GB+. I tried benchmarking php-fpm directly with result of 150hits/s without any timeouts. But after that the CPU usage goes 100% and it stops responding. Can you help me optimize it to handle more? As i understand nginx has nothing to do over here so i dont put its config. php-fpm config listen = /tmp/php5-fpm.sock listen.allowed_clients = 127.0.0.1 user = nginx group = nginx pm = dynamic pm.max_children = 150 pm.start_servers = 7 pm.min_spare_servers = 2 pm.max_spare_servers = 15 pm.max_requests = 500 slowlog = /var/log/php-fpm/www-slow.log php_admin_value[error_log] = /var/log/php-fpm/www-error.log php_admin_flag[log_errors] = on apc extension = apc.so apc.enabled=1 apc.shm_size=512MB apc.num_files_hint=0 apc.user_entries_hint=0 apc.ttl=7200 apc.use_request_time=1 apc.user_ttl=7200 apc.gc_ttl=3600 apc.cache_by_default=1 apc.filters apc.mmap_file_mask=/tmp/apc.XXXXXX apc.file_update_protection=2 apc.enable_cli=0 apc.max_file_size=1M apc.stat=1 apc.stat_ctime=0 apc.canonicalize=0 apc.write_lock=1 apc.report_autofilter=0 apc.rfc1867=0 apc.rfc1867_prefix =upload_ apc.rfc1867_name=APC_UPLOAD_PROGRESS apc.rfc1867_freq=0 apc.rfc1867_ttl=3600 apc.include_once_override=0 apc.lazy_classes=0 apc.lazy_functions=0 apc.coredump_unmap=0 apc.file_md5=0 apc.preload_path Varnish VCL backend default { .host = "127.0.0.1"; .port = "8080"; .connect_timeout = 6s; .first_byte_timeout = 6s; .between_bytes_timeout = 60s; } acl purgehosts { "localhost"; "127.0.0.1"; } # Called after a document has been successfully retrieved from the backend. sub vcl_fetch { # Uncomment to make the default cache "time to live" is 5 minutes, handy # but it may cache stale pages unless purged. (TODO) # By default Varnish will use the headers sent to it by Apache (the backend server) # to figure out the correct TTL. # WP Super Cache sends a TTL of 3 seconds, set in wp-content/cache/.htaccess set beresp.ttl = 24h; # Strip cookies for static files and set a long cache expiry time. if (req.url ~ "\.(jpg|jpeg|gif|png|ico|css|zip|tgz|gz|rar|bz2|pdf|txt|tar|wav|bmp|rtf|js|flv|swf|html|htm)$") { unset beresp.http.set-cookie; set beresp.ttl = 24h; } # If WordPress cookies found then page is not cacheable if (req.http.Cookie ~"(wp-postpass|wordpress_logged_in|comment_author_)") { # set beresp.cacheable = false;#versions less than 3 #beresp.ttl>0 is cacheable so 0 will not be cached set beresp.ttl = 0s; } else { #set beresp.cacheable = true; set beresp.ttl=24h;#cache for 24hrs } # Varnish determined the object was not cacheable #if ttl is not > 0 seconds then it is cachebale if (!beresp.ttl > 0s) { # set beresp.http.X-Cacheable = "NO:Not Cacheable"; } else if ( req.http.Cookie ~"(wp-postpass|wordpress_logged_in|comment_author_)" ) { # You don't wish to cache content for logged in users set beresp.http.X-Cacheable = "NO:Got Session"; return(hit_for_pass); #previously just pass but changed in v3+ } else if ( beresp.http.Cache-Control ~ "private") { # You are respecting the Cache-Control=private header from the backend set beresp.http.X-Cacheable = "NO:Cache-Control=private"; return(hit_for_pass); } else if ( beresp.ttl < 1s ) { # You are extending the lifetime of the object artificially set beresp.ttl = 300s; set beresp.grace = 300s; set beresp.http.X-Cacheable = "YES:Forced"; } else { # Varnish determined the object was cacheable set beresp.http.X-Cacheable = "YES"; if (beresp.status == 404 || beresp.status >= 500) { set beresp.ttl = 0s; } # Deliver the content return(deliver); } sub vcl_hash { # Each cached page has to be identified by a key that unlocks it. # Add the browser cookie only if a WordPress cookie found. if ( req.http.Cookie ~"(wp-postpass|wordpress_logged_in|comment_author_)" ) { #set req.hash += req.http.Cookie; hash_data(req.http.Cookie); } } # vcl_recv is called whenever a request is received sub vcl_recv { # remove ?ver=xxxxx strings from urls so css and js files are cached. # Watch out when upgrading WordPress, need to restart Varnish or flush cache. set req.url = regsub(req.url, "\?ver=.*$", ""); # Remove "replytocom" from requests to make caching better. set req.url = regsub(req.url, "\?replytocom=.*$", ""); remove req.http.X-Forwarded-For; set req.http.X-Forwarded-For = client.ip; # Exclude this site because it breaks if cached if ( req.http.host == "sr.ituts.gr" ) { return( pass ); } # Serve objects up to 2 minutes past their expiry if the backend is slow to respond. set req.grace = 120s; # Strip cookies for static files: if (req.url ~ "\.(jpg|jpeg|gif|png|ico|css|zip|tgz|gz|rar|bz2|pdf|txt|tar|wav|bmp|rtf|js|flv|swf|html|htm)$") { unset req.http.Cookie; return(lookup); } # Remove has_js and Google Analytics __* cookies. set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s*)(__[a-z]+|has_js)=[^;]*", ""); # Remove a ";" prefix, if present. set req.http.Cookie = regsub(req.http.Cookie, "^;\s*", ""); # Remove empty cookies. if (req.http.Cookie ~ "^\s*$") { unset req.http.Cookie; } if (req.request == "PURGE") { if (!client.ip ~ purgehosts) { error 405 "Not allowed."; } #previous version ban() was purge() ban("req.url ~ " + req.url + " && req.http.host == " + req.http.host); error 200 "Purged."; } # Pass anything other than GET and HEAD directly. if (req.request != "GET" && req.request != "HEAD") { return( pass ); } /* We only deal with GET and HEAD by default */ # remove cookies for comments cookie to make caching better. set req.http.cookie = regsub(req.http.cookie, "1231111111111111122222222333333=[^;]+(; )?", ""); # never cache the admin pages, or the server-status page, or your feed? you may want to..i don't if (req.request == "GET" && (req.url ~ "(wp-admin|bb-admin|server-status|feed)")) { return(pipe); } # don't cache authenticated sessions if (req.http.Cookie && req.http.Cookie ~ "(wordpress_|PHPSESSID)") { return(lookup); } # don't cache ajax requests if(req.http.X-Requested-With == "XMLHttpRequest" || req.url ~ "nocache" || req.url ~ "(control.php|wp-comments-post.php|wp-login.php|bb-login.php|bb-reset-password.php|register.php)") { return (pass); } return( lookup ); } Varnish Daemon options DAEMON_OPTS="-a :80 \ -T 127.0.0.1:6082 \ -f /etc/varnish/ituts.vcl \ -u varnish -g varnish \ -S /etc/varnish/secret \ -p thread_pool_add_delay=2 \ -p thread_pools=8 \ -p thread_pool_min=100 \ -p thread_pool_max=1000 \ -p session_linger=50 \ -p session_max=150000 \ -p sess_workspace=262144 \ -s malloc,5G" Im not sure where to start, should i for start optimize php-fpm and then go to varnish or php-fpm is at its max right now so i should start looking for the problem in varnish?

Read the article
Scripts help FIND command via atime output to multiple files

- by sswagner

here is a script I have wrote that I need help with. in the script I do a find for any file that has not been access for over 30 days, 60, 90, 180, 270 & 365 days. This works just fine. however, this takes a few days just to finish the 30 day portion. it is scanning a NAS. (millions and millions of files) as you see, the 30 day information really holds all the data need for the rest of the scripts. the 60, 90, etc. portion of the script are just redoing the same effort as the 30 day portion, except for an extended time frame. it would save in this case weeks worth of re-scanning if some how the 60, 90 180, etc.. portions could just get its data from the 30 day output. this is where I am asking for help. the output is just like an ls -l command. and you can also see from the output below, there are multiple years in this output. the script is attached and printed below. total 24 -rw-r--r-- 1 root bin 60 Apr 12 13:07 config_file -rw-r--r-- 1 root bin 9 Apr 12 13:07 config_file.InProgress -rw-r--r-- 1 root bin 0 Apr 12 13:07 config_file.sids -rw-r--r-- 1 root bin 1284 Apr 19 10:41 rpt_file -rw-r--r-- 1 16074 5003 20083 Apr 26 2002 /nas/quota/slot_2/CR_APP002/eb_ora_bin1/sun8/product/9.2s/oem_webstage/oracle/sysman/qtour/console/dat1_01.gif -rw-r--r-- 1 16074 5003 20088 Apr 26 2002 /nas/quota/slot_2/CR_APP002/eb_ora_bin1/sun8/product/9.2s/oem_webstage/oracle/sysman/qtour/console/set1_04.gif -rw-r--r-- 1 16074 5003 2008 Apr 26 2002 /nas/quota/slot_2/CR_APP002/eb_ora_bin1/sun8/product/9.2s/oem_webstage/oracle/sysman/qtour/oapps/get2_03.htm -rw-r--r-- 1 16074 5003 20083 Apr 26 2002 /nas/quota/slot_2/CR_APP002/eb_ora_bin1/sun8/product/9.2s/oem_webstage/oracle/sysman/qtour/oapps/per1_01.gif any help is appreciated. these are linux distro boxes, so I am sure perl is on there too if needed.. Thanks! !/bin/ksh # search shares for files that have not been accessed for a certain time. NOTE: $IN = input search $OUT = output directory for text file # TESTS Numeric arguments can be specified as # +n for greater than n, -n for less than n, n for exactly n. # -atime n File was last accessed n*24 hours ago. # # IN1=/nas/quota/slot_2/CR* IN2=/nas/quota/slot_3/CR* IN3=/nas/quota/slot_4/CR* IN4=/nas/quota/slot_5/CR* OUT=/nas/quota/slot_3/CR_PRJ144/steve mkdir ${OUT} for dir in ${IN1}; do find $dir -atime +30 -exec ls -l '{}' \; ${OUT}/30days.txt; done for dir in ${IN2}; do find $dir -atime +30 -exec ls -l '{}' \; ${OUT}/30days.txt; done for dir in ${IN3}; do find $dir -atime +30 -exec ls -l '{}' \; ${OUT}/30days.txt; done for dir in ${IN4}; do find $dir -atime +30 -exec ls -l '{}' \; ${OUT}/30days.txt; done for dir in ${IN1}; do find $dir -atime +60 -exec ls -l '{}' \; ${OUT}/60days.txt; done for dir in ${IN2}; do find $dir -atime +60 -exec ls -l '{}' \; ${OUT}/60days.txt; done for dir in ${IN3}; do find $dir -atime +60 -exec ls -l '{}' \; ${OUT}/60days.txt; done for dir in ${IN4}; do find $dir -atime +60 -exec ls -l '{}' \; ${OUT}/60days.txt; done for dir in ${IN1}; do find $dir -atime +90 -exec ls -l '{}' \; ${OUT}/90days.txt; done for dir in ${IN2}; do find $dir -atime +90 -exec ls -l '{}' \; ${OUT}/90days.txt; done for dir in ${IN3}; do find $dir -atime +90 -exec ls -l '{}' \; ${OUT}/90days.txt; done for dir in ${IN4}; do find $dir -atime +90 -exec ls -l '{}' \; ${OUT}/90days.txt; done for dir in ${IN1}; do find $dir -atime +180 -exec ls -l '{}' \; ${OUT}/180days.txt; done for dir in ${IN2}; do find $dir -atime +180 -exec ls -l '{}' \; ${OUT}/180days.txt; done for dir in ${IN3}; do find $dir -atime +180 -exec ls -l '{}' \; ${OUT}/180days.txt; done for dir in ${IN4}; do find $dir -atime +180 -exec ls -l '{}' \; ${OUT}/180days.txt; done for dir in ${IN1}; do find $dir -atime +270 -exec ls -l '{}' \; ${OUT}/270days.txt; done for dir in ${IN2}; do find $dir -atime +270 -exec ls -l '{}' \; ${OUT}/270days.txt; done for dir in ${IN3}; do find $dir -atime +270 -exec ls -l '{}' \; ${OUT}/270days.txt; done for dir in ${IN4}; do find $dir -atime +270 -exec ls -l '{}' \; ${OUT}/270days.txt; done for dir in ${IN1}; do find $dir -atime +365 -exec ls -l '{}' \; ${OUT}/365days.txt; done for dir in ${IN2}; do find $dir -atime +365 -exec ls -l '{}' \; ${OUT}/365days.txt; done for dir in ${IN3}; do find $dir -atime +365 -exec ls -l '{}' \; ${OUT}/365days.txt; done for dir in ${IN4}; do find $dir -atime +365 -exec ls -l '{}' \; ${OUT}/365days.txt; done

Read the article
ftp connection problem, vsftp server, active mode

- by Mark Szente

I have a server that runs vsftpd to handle ftp connections. One of my users have a notebook with Total Commander and WinSCP installed. Both ftp clients fail right after the connection is established to the server and it tries to download the directory listing with the following error message: Timeout detected. Could not retrieve directory listing PORT command successful. Consider using PASV. Error listing directory '/'. The weird thing is: the notebook works perfectly ok with other ftp servers. My ftp server also works well with other clients. In fact, this user also has a pc running on the same LAN as the notebook and the pc works well with the ftp server. We use PORT ftp connection mode. Passive mode works well but is not an option at this point. I would post more technical details but I don't even know what this problem is related to. Anyway, below is the server side tcpdump for the failed connection attempt. There's no further communication between the client and the server after the last line of log. Thank you very much for any hint! 23:39:24.514852 IP 62.201.xx.xx.2241 > 195.70.xx.xx.21: S 1314489715:1314489715(0) win 65535 <mss 1460,nop,wscale 3,nop,nop,sackOK> 23:39:24.514896 IP 195.70.xx.xx.21 > 62.201.xx.xx.2241: S 2633658883:2633658883(0) ack 1314489716 win 5840 <mss 1460,nop,nop,sackOK,nop,wscale 2> 23:39:24.520842 IP 62.201.xx.xx.2241 > 195.70.xx.xx.21: . ack 1 win 62500 23:39:24.523803 IP 195.70.xx.xx.21 > 62.201.xx.xx.2241: P 1:21(20) ack 1 win 1460 23:39:24.546858 IP 62.201.xx.xx.2241 > 195.70.xx.xx.21: P 1:15(14) ack 21 win 62497 23:39:24.546902 IP 195.70.xx.xx.21 > 62.201.xx.xx.2241: . ack 15 win 1460 23:39:24.547247 IP 195.70.xx.xx.21 > 62.201.xx.xx.2241: P 21:55(34) ack 15 win 1460 23:39:24.762806 IP 62.201.xx.xx.2241 > 195.70.xx.xx.21: . ack 55 win 62493 23:39:30.415011 IP 62.201.xx.xx.2241 > 195.70.xx.xx.21: P 15:28(13) ack 55 win 62493 23:39:30.454116 IP 195.70.xx.xx.21 > 62.201.xx.xx.2241: . ack 28 win 1460 23:39:31.036283 IP 195.70.xx.xx.21 > 62.201.xx.xx.2241: P 55:78(23) ack 28 win 1460 23:39:31.053018 IP 62.201.xx.xx.2241 > 195.70.xx.xx.21: P 28:34(6) ack 78 win 62490 23:39:31.053042 IP 195.70.xx.xx.21 > 62.201.xx.xx.2241: . ack 34 win 1460 23:39:31.053268 IP 195.70.xx.xx.21 > 62.201.xx.xx.2241: P 78:97(19) ack 34 win 1460 23:39:31.068969 IP 62.201.xx.xx.2241 > 195.70.xx.xx.21: P 34:40(6) ack 97 win 62488 23:39:31.069148 IP 195.70.xx.xx.21 > 62.201.xx.xx.2241: P 97:112(15) ack 40 win 1460 23:39:31.069179 IP 195.70.xx.xx.21 > 62.201.xx.xx.2241: P 112:119(7) ack 40 win 1460 23:39:31.076981 IP 62.201.xx.xx.2241 > 195.70.xx.xx.21: . ack 119 win 62485 23:39:31.077010 IP 195.70.xx.xx.21 > 62.201.xx.xx.2241: P 119:177(58) ack 40 win 1460 23:39:31.114979 IP 62.201.xx.xx.2241 > 195.70.xx.xx.21: P 40:45(5) ack 177 win 62478 23:39:31.115164 IP 195.70.xx.xx.21 > 62.201.xx.xx.2241: P 177:186(9) ack 45 win 1460 23:39:31.180966 IP 62.201.xx.xx.2241 > 195.70.xx.xx.21: P 45:53(8) ack 186 win 62476 23:39:31.181066 IP 195.70.xx.xx.21 > 62.201.xx.xx.2241: P 186:216(30) ack 53 win 1460 23:39:31.213065 IP 62.201.xx.xx.2241 > 195.70.xx.xx.21: P 53:80(27) ack 216 win 62473 23:39:31.213180 IP 195.70.xx.xx.21 > 62.201.xx.xx.2241: P 216:267(51) ack 80 win 1460 23:39:31.251086 IP 62.201.xx.xx.2241 > 195.70.xx.xx.21: P 80:86(6) ack 267 win 62466 23:39:31.251498 IP 195.70.xx.xx.20 > 62.201.xx.xx.5001: S 2640780713:2640780713(0) win 5840 <mss 1460,sackOK,timestamp 2054371220 0,nop,wscale 2> 23:39:31.290979 IP 195.70.xx.xx.21 > 62.201.xx.xx.2241: . ack 86 win 1460 23:39:34.251489 IP 195.70.xx.xx.20 > 62.201.xx.xx.5001: S 2640780713:2640780713(0) win 5840 <mss 1460,sackOK,timestamp 2054374220 0,nop,wscale 2> 23:39:40.249625 IP 195.70.xx.xx.20 > 62.201.xx.xx.5001: S 2640780713:2640780713(0) win 5840 <mss 1460,sackOK,timestamp 2054380220 0,nop,wscale 2> 23:39:43.695108 IP 195.70.xx.xx.21 > 62.201.xx.xx.1057: P 2280716551:2280716588(37) ack 3838413728 win 5840 23:39:52.248791 IP 195.70.xx.xx.20 > 62.201.xx.xx.5001: S 2640780713:2640780713(0) win 5840 <mss 1460,sackOK,timestamp 2054392220 0,nop,wscale 2> 23:40:16.245159 IP 195.70.xx.xx.20 > 62.201.xx.xx.5001: S 2640780713:2640780713(0) win 5840 <mss 1460,sackOK,timestamp 2054416221 0,nop,wscale 2> 23:40:29.853685 IP 195.70.xx.xx.21 > 62.201.xx.xx.1057: FP 37:51(14) ack 1 win 5840 23:40:31.241951 IP 195.70.xx.xx.21 > 62.201.xx.xx.2241: P 267:304(37) ack 86 win 1460 23:40:31.381708 IP 62.201.xx.xx.2241 > 195.70.xx.xx.21: . ack 304 win 62462

Read the article
mdadm: Win7-install created a boot partition on one of my RAID6 drives. How to rebuild?

- by EXIT_FAILURE

My problem happened when I attempted to install Windows 7 on it's own SSD. The Linux OS I used which has knowledge of the software RAID system is on a SSD that I disconnected prior to the install. This was so that windows (or I) wouldn't inadvertently mess it up. However, and in retrospect, foolishly, I left the RAID disks connected, thinking that windows wouldn't be so ridiculous as to mess with a HDD that it sees as just unallocated space. Boy was I wrong! After copying over the installation files to the SSD (as expected and desired), it also created an ntfs partition on one of the RAID disks. Both unexpected and totally undesired! . I changed out the SSDs again, and booted up in linux. mdadm didn't seem to have any problem assembling the array as before, but if I tried to mount the array, I got the error message: mount: wrong fs type, bad option, bad superblock on /dev/md0, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so dmesg: EXT4-fs (md0): ext4_check_descriptors: Block bitmap for group 0 not in group (block 1318081259)! EXT4-fs (md0): group descriptors corrupted! I then used qparted to delete the newly created ntfs partition on /dev/sdd so that it matched the other three /dev/sd{b,c,e}, and requested a resync of my array with echo repair > /sys/block/md0/md/sync_action This took around 4 hours, and upon completion, dmesg reports: md: md0: requested-resync done. A bit brief after a 4-hour task, though I'm unsure as to where other log files exist (I also seem to have messed up my sendmail configuration). In any case: No change reported according to mdadm, everything checks out. mdadm -D /dev/md0 still reports: Version : 1.2 Creation Time : Wed May 23 22:18:45 2012 Raid Level : raid6 Array Size : 3907026848 (3726.03 GiB 4000.80 GB) Used Dev Size : 1953513424 (1863.02 GiB 2000.40 GB) Raid Devices : 4 Total Devices : 4 Persistence : Superblock is persistent Update Time : Mon May 26 12:41:58 2014 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 4K Name : okamilinkun:0 UUID : 0c97ebf3:098864d8:126f44e3:e4337102 Events : 423 Number Major Minor RaidDevice State 0 8 16 0 active sync /dev/sdb 1 8 32 1 active sync /dev/sdc 2 8 48 2 active sync /dev/sdd 3 8 64 3 active sync /dev/sde Trying to mount it still reports: mount: wrong fs type, bad option, bad superblock on /dev/md0, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so and dmesg: EXT4-fs (md0): ext4_check_descriptors: Block bitmap for group 0 not in group (block 1318081259)! EXT4-fs (md0): group descriptors corrupted! I'm a bit unsure where to proceed from here, and trying stuff "to see if it works" is a bit too risky for me. This is what I suggest I should attempt to do: Tell mdadm that /dev/sdd (the one that windows wrote into) isn't reliable anymore, pretend it is newly re-introduced to the array, and reconstruct its content based on the other three drives. I also could be totally wrong in my assumptions, that the creation of the ntfs partition on /dev/sdd and subsequent deletion has changed something that cannot be fixed this way. My question: Help, what should I do? If I should do what I suggested , how do I do that? From reading documentation, etc, I would think maybe: mdadm --manage /dev/md0 --set-faulty /dev/sdd mdadm --manage /dev/md0 --remove /dev/sdd mdadm --manage /dev/md0 --re-add /dev/sdd However, the documentation examples suggest /dev/sdd1, which seems strange to me, as there is no partition there as far as linux is concerned, just unallocated space. Maybe these commands won't work without. Maybe it makes sense to mirror the partition table of one of the other raid devices that weren't touched, before --re-add. Something like: sfdisk -d /dev/sdb | sfdisk /dev/sdd Bonus question: Why would the Windows 7 installation do something so st...potentially dangerous? Update I went ahead and marked /dev/sdd as faulty, and removed it (not physically) from the array: # mdadm --manage /dev/md0 --set-faulty /dev/sdd # mdadm --manage /dev/md0 --remove /dev/sdd However, attempting to --re-add was disallowed: # mdadm --manage /dev/md0 --re-add /dev/sdd mdadm: --re-add for /dev/sdd to /dev/md0 is not possible --add, was fine. # mdadm --manage /dev/md0 --add /dev/sdd mdadm -D /dev/md0 now reports the state as clean, degraded, recovering, and /dev/sdd as spare rebuilding. /proc/mdstat shows the recovery progress: md0 : active raid6 sdd[4] sdc[1] sde[3] sdb[0] 3907026848 blocks super 1.2 level 6, 4k chunk, algorithm 2 [4/3] [UU_U] [>....................] recovery = 2.1% (42887780/1953513424) finish=348.7min speed=91297K/sec nmon also shows expected output: ¦sdb 0% 87.3 0.0| > |¦ ¦sdc 71% 109.1 0.0|RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR > |¦ ¦sdd 40% 0.0 87.3|WWWWWWWWWWWWWWWWWWWW > |¦ ¦sde 0% 87.3 0.0|> || It looks good so far. Crossing my fingers for another five+ hours :) Update 2 The recovery of /dev/sdd finished, with dmesg output: [44972.599552] md: md0: recovery done. [44972.682811] RAID conf printout: [44972.682815] --- level:6 rd:4 wd:4 [44972.682817] disk 0, o:1, dev:sdb [44972.682819] disk 1, o:1, dev:sdc [44972.682820] disk 2, o:1, dev:sdd [44972.682821] disk 3, o:1, dev:sde Attempting mount /dev/md0 reports: mount: wrong fs type, bad option, bad superblock on /dev/md0, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so And on dmesg: [44984.159908] EXT4-fs (md0): ext4_check_descriptors: Block bitmap for group 0 not in group (block 1318081259)! [44984.159912] EXT4-fs (md0): group descriptors corrupted! I'm not sure what do do now. Suggestions? Output of dumpe2fs /dev/md0: dumpe2fs 1.42.8 (20-Jun-2013) Filesystem volume name: Atlas Last mounted on: /mnt/atlas Filesystem UUID: e7bfb6a4-c907-4aa0-9b55-9528817bfd70 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize Filesystem flags: signed_directory_hash Default mount options: user_xattr acl Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 244195328 Block count: 976756712 Reserved block count: 48837835 Free blocks: 92000180 Free inodes: 243414877 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 791 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 8192 Inode blocks per group: 512 RAID stripe width: 2 Flex block group size: 16 Filesystem created: Thu May 24 07:22:41 2012 Last mount time: Sun May 25 23:44:38 2014 Last write time: Sun May 25 23:46:42 2014 Mount count: 341 Maximum mount count: -1 Last checked: Thu May 24 07:22:41 2012 Check interval: 0 (<none>) Lifetime writes: 4357 GB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 28 Desired extra isize: 28 Journal inode: 8 Default directory hash: half_md4 Directory Hash Seed: e177a374-0b90-4eaa-b78f-d734aae13051 Journal backup: inode blocks dumpe2fs: Corrupt extent header while reading journal super block

Read the article
Backing up my data causes my server to crash using Symantec Backup Exec 12, or How I Came to Loathe Irony

- by Kyle Noland

I have a Dell PowerEdge 2850 running Windows Server 2003. It is the primary file server for one of my clients. I have another server also running Windows Server 2003 that acts as the core media server for Symantec Backup Exec 12. I recently upgraded from Backup Exec 11d to 12. This upgrade was necessary because we also just upgraded from Exchange 2003 to Exchange 2007. After the upgrade I had to push-install the new version 12 Backup Exec Remote Agents to each of the servers I am backing up (about 6 total). 5 of my servers are doing just fine, faithfully completing backups every night. My file server routinely crashes. Observations: When the server crashes, it does not blue screen, it just locks up completely. Even the mouse is unresponsive. If you leave the server locked up long enough, it will eventually reboot itself and hang on the Windows splash screen. There is absolutely zero useful Event Viewer evidence of a problem. The logs go from routine logging to an Unexplained Shutdown Event the next morning when I have to hard reset the server to get it to boot. 90% of the time the server does not boot cleanly, it hangs on the Windows splash screen. I don't have any light to shed here. When the server hangs all I can do is hard reset it and try again. Even after a successful boot and chkdsk /r operation, if you reboot the machine, you have a 90% chance it won't back up again cleanly. The back story: This server started crashing during nightly backups about a month ago. I tried everything I could think of to troubleshoot the problem and eventually had to give up because I could not keep coming to the office at 4 AM to try to get the server back online. One Friday I got lucky and the server stayed up for its entire full backup. I took this opportunity to restore the full backup to a temporary server I set up and switched all my users to the temporary. Then I reloaded the ailing file server. I kept all my users on the temporary file server for about 3 weeks. I installed the same Backup Exec Remote Agent and Trend Micro A/V client on the temporary server that I was using on the regular file server. During this time, I had absolutely no problems backing up the temporary server. I tested the reloaded file server extensively. I rebooted the server once an hour every day for 3 weeks trying to make it fail. It never did. I felt confident that the reload was the answer to my problems. I moved all of the data from the temporary server back to the regular server. I got 3 nightly backups out of it before it locked up again and started the familiar failure to boot cleanly behavior. This weekend I decided to monitor the file server through the entire backup job. I RDPd into the file server and also into the server running Backup Exec. On the file server I opened the Task Manager so I could view the processes and watch CPU and memory usage. Everything was running smoothly for about 60GB worth of backup. Then I noticed that the byte count of the backup job in Backup Exec had stopped progressing. I looked back over at my RDP session into the file server, and I was getting real time updates about CPU and memory usage still - both nearly 0%, which is unusual. Backups usually hover around 40% usage for the duration of the backup job. Let me reiterate this point: The screen was refreshing and I was getting real time Task Manager updates - until I clicked on the Start menu. The screen went black and the server locked up. In truth, I think the server had already locked up, the video card just hadn't figured it out yet. I went back into my bag of trick: driving to the office and hard reseting the server over and over again when it hangs up at the Windows splash screen. I did this for 2 hours without getting a successful boot. I started panicking because I did not have a decent backup to use to get everything back onto the working temporary file server. Once I exhausted everything I knew to do, I took a deep breath, booted to the Windows Server 2003 CD and performed a repair installation of Windows. The server came back up fine, with all of my data intact. I can now reboot the server at will and it will come back up cleanly. The problem is that I'm afraid as soon as I try to back that data up again I will back at square one. So let me sum things up: Here is what I've done so far to troubleshoot this server: Deleted and recreated the RAID 5 sets. Initialized the drives. Reloaded the server with a fresh Server 2003 install. Confirmed with Dell that I have installed the latest, Dell approved BIOS and NIC drivers. Uninstalled / reinstalled the Backup Exec Remote Agent. Uninstalled the Trend Micro A/V client. Configured the server not to reboot itself after a blue screen so I can see any stop error. I used to think the server was blue screening, but since I enabled this setting I now know that the server just completely locks up. Run chkdsk /r from the Windows Recovery Console. Several errors were found and corrected, but did not help my problem. Help confirm or deny the following assumptions: There are two problems at work here. Why the server is locking up in the first place, and why the server won't boot cleanly after a lockup. This is ultimately a software problem. The server works fine and can be rebooted cleanly all day long - until the first lockup - following a fresh OS load or even a Repair installation. This is not a problem with Backup Exec in general. All of my other servers back up just fine. For the record, all of the other servers run Server 2003, and some of them house more data than the file server in question here. Any help is appreciated. The irony is almost too much to bear. Backing up my data is what is jeopardizing it.

Read the article
Error 5 partition table invalid or corrupt

- by Clodoaldo

I'm trying to add a second SSD to a Centos 6 system. But I get the Error 5 partition table invalid or corrupt at boot. The system already has a single SSD (sdb) and a pair of HDDs (sd{a,c}) in a RAID 1 array from where it boots. It is as if the new SSD assumes one of the devices of the RAID array. Is it? How to avoid that or rearrange the setup? # cat fstab UUID=967b4035-782d-4c66-b22f-50244fe970ca / ext4 defaults 1 1 UUID=86fd06e9-cdc9-4166-ba9f-c237cfc43e02 /boot ext4 defaults 1 2 UUID=72552a7a-d8ae-4f0a-8917-b75a6239ce9f /ssd ext4 discard,relatime 1 2 UUID=8000e5e6-caa2-4765-94f8-9caeb2bda26e swap swap defaults 0 0 tmpfs /dev/shm tmpfs defaults 0 0 devpts /dev/pts devpts gid=5,mode=620 0 0 sysfs /sys sysfs defaults 0 0 proc /proc proc defaults 0 0 # ll /dev/disk/by-id/ total 0 lrwxrwxrwx. 1 root root 9 Jun 15 23:50 ata-OCZ-VERTEX3_OCZ-43DSRFTNCLE9ZJXX -> ../../sdb lrwxrwxrwx. 1 root root 10 Jun 15 23:50 ata-OCZ-VERTEX3_OCZ-43DSRFTNCLE9ZJXX-part1 -> ../../sdb1 lrwxrwxrwx. 1 root root 9 Jun 15 23:50 ata-ST3500413AS_5VMT49E3 -> ../../sdc lrwxrwxrwx. 1 root root 10 Jun 15 23:50 ata-ST3500413AS_5VMT49E3-part1 -> ../../sdc1 lrwxrwxrwx. 1 root root 10 Jun 15 23:50 ata-ST3500413AS_5VMT49E3-part2 -> ../../sdc2 lrwxrwxrwx. 1 root root 10 Jun 15 23:50 ata-ST3500413AS_5VMT49E3-part3 -> ../../sdc3 lrwxrwxrwx. 1 root root 9 Jun 15 23:50 ata-ST3500413AS_5VMTJNAJ -> ../../sda lrwxrwxrwx. 1 root root 10 Jun 15 23:50 ata-ST3500413AS_5VMTJNAJ-part1 -> ../../sda1 lrwxrwxrwx. 1 root root 10 Jun 15 23:50 ata-ST3500413AS_5VMTJNAJ-part2 -> ../../sda2 lrwxrwxrwx. 1 root root 10 Jun 15 23:50 ata-ST3500413AS_5VMTJNAJ-part3 -> ../../sda3 lrwxrwxrwx. 1 root root 9 Jun 15 23:50 md-name-localhost.localdomain:0 -> ../../md0 lrwxrwxrwx. 1 root root 9 Jun 15 23:50 md-name-localhost.localdomain:1 -> ../../md1 lrwxrwxrwx. 1 root root 9 Jun 15 23:50 md-name-localhost.localdomain:2 -> ../../md2 lrwxrwxrwx. 1 root root 9 Jun 15 23:50 md-uuid-a04d7241:8da6023e:f9004352:107a923a -> ../../md1 lrwxrwxrwx. 1 root root 9 Jun 15 23:50 md-uuid-a22c43b9:f1954990:d3ddda5e:f9aff3c9 -> ../../md0 lrwxrwxrwx. 1 root root 9 Jun 15 23:50 md-uuid-f403a2d0:447803b5:66edba73:569f8305 -> ../../md2 lrwxrwxrwx. 1 root root 9 Jun 15 23:50 scsi-SATA_OCZ-VERTEX3_OCZ-43DSRFTNCLE9ZJXX -> ../../sdb lrwxrwxrwx. 1 root root 10 Jun 15 23:50 scsi-SATA_OCZ-VERTEX3_OCZ-43DSRFTNCLE9ZJXX-part1 -> ../../sdb1 lrwxrwxrwx. 1 root root 9 Jun 15 23:50 scsi-SATA_ST3500413AS_5VMT49E3 -> ../../sdc lrwxrwxrwx. 1 root root 10 Jun 15 23:50 scsi-SATA_ST3500413AS_5VMT49E3-part1 -> ../../sdc1 lrwxrwxrwx. 1 root root 10 Jun 15 23:50 scsi-SATA_ST3500413AS_5VMT49E3-part2 -> ../../sdc2 lrwxrwxrwx. 1 root root 10 Jun 15 23:50 scsi-SATA_ST3500413AS_5VMT49E3-part3 -> ../../sdc3 lrwxrwxrwx. 1 root root 9 Jun 15 23:50 scsi-SATA_ST3500413AS_5VMTJNAJ -> ../../sda lrwxrwxrwx. 1 root root 10 Jun 15 23:50 scsi-SATA_ST3500413AS_5VMTJNAJ-part1 -> ../../sda1 lrwxrwxrwx. 1 root root 10 Jun 15 23:50 scsi-SATA_ST3500413AS_5VMTJNAJ-part2 -> ../../sda2 lrwxrwxrwx. 1 root root 10 Jun 15 23:50 scsi-SATA_ST3500413AS_5VMTJNAJ-part3 -> ../../sda3 lrwxrwxrwx. 1 root root 9 Jun 15 23:50 wwn-0x5000c500383621ff -> ../../sdc lrwxrwxrwx. 1 root root 10 Jun 15 23:50 wwn-0x5000c500383621ff-part1 -> ../../sdc1 lrwxrwxrwx. 1 root root 10 Jun 15 23:50 wwn-0x5000c500383621ff-part2 -> ../../sdc2 lrwxrwxrwx. 1 root root 10 Jun 15 23:50 wwn-0x5000c500383621ff-part3 -> ../../sdc3 lrwxrwxrwx. 1 root root 9 Jun 15 23:50 wwn-0x5000c5003838b2e7 -> ../../sda lrwxrwxrwx. 1 root root 10 Jun 15 23:50 wwn-0x5000c5003838b2e7-part1 -> ../../sda1 lrwxrwxrwx. 1 root root 10 Jun 15 23:50 wwn-0x5000c5003838b2e7-part2 -> ../../sda2 lrwxrwxrwx. 1 root root 10 Jun 15 23:50 wwn-0x5000c5003838b2e7-part3 -> ../../sda3 lrwxrwxrwx. 1 root root 9 Jun 15 23:50 wwn-0x5e83a97f592139d6 -> ../../sdb lrwxrwxrwx. 1 root root 10 Jun 15 23:50 wwn-0x5e83a97f592139d6-part1 -> ../../sdb1 # fdisk -l Disk /dev/sdb: 120.0 GB, 120034123776 bytes 255 heads, 63 sectors/track, 14593 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x79298ec9 Device Boot Start End Blocks Id System /dev/sdb1 1 14594 117219328 83 Linux Disk /dev/sdc: 500.1 GB, 500107862016 bytes 255 heads, 63 sectors/track, 60801 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x000d99de Device Boot Start End Blocks Id System /dev/sdc1 1 1275 10240000 fd Linux raid autodetect /dev/sdc2 * 1275 1339 512000 fd Linux raid autodetect /dev/sdc3 1339 60802 477633536 fd Linux raid autodetect Disk /dev/sda: 500.1 GB, 500107862016 bytes 255 heads, 63 sectors/track, 60801 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x000b3327 Device Boot Start End Blocks Id System /dev/sda1 1 1275 10240000 fd Linux raid autodetect /dev/sda2 * 1275 1339 512000 fd Linux raid autodetect /dev/sda3 1339 60802 477633536 fd Linux raid autodetect Disk /dev/md0: 10.5 GB, 10484641792 bytes 2 heads, 4 sectors/track, 2559727 cylinders Units = cylinders of 8 * 512 = 4096 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000 Disk /dev/md0 doesn't contain a valid partition table Disk /dev/md2: 489.1 GB, 489095557120 bytes 2 heads, 4 sectors/track, 119408095 cylinders Units = cylinders of 8 * 512 = 4096 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000 Disk /dev/md2 doesn't contain a valid partition table Disk /dev/md1: 524 MB, 524275712 bytes 2 heads, 4 sectors/track, 127997 cylinders Units = cylinders of 8 * 512 = 4096 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000 Disk /dev/md1 doesn't contain a valid partition table # cat /etc/grub.conf default=0 timeout=5 splashimage=(hd2,1)/grub/splash.xpm.gz hiddenmenu title CentOS (2.6.32-220.17.1.el6.x86_64) root (hd2,1) kernel /vmlinuz-2.6.32-220.17.1.el6.x86_64 ro root=UUID=967b4035-782d-4c66-b22f-50244fe970ca rd_MD_UUID=f403a2d0:447803b5:66edba73:569f8305 rd_MD_UUID=a22c43b9:f1954990:d3ddda5e:f9aff3c9 rd_NO_LUKS rd_NO_LVM rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=br-abnt2 crashkernel=auto rhgb quiet initrd /initramfs-2.6.32-220.17.1.el6.x86_64.img

Read the article

Search Results

Search found 5942 results on 238 pages for 'total starnger'.

Page 233/238 | < Previous Page | 229 230 231 232 233 234 235 236 237 238 | Next Page >

- by SoLoGHoST

- by osc2nuke

- by Kurt W

- by quanta

- by Alex

- by javano

- by martin

- by DaveCol

- by mgc8888

- by Kyle Noland

- by pranjal

- by Mecki

- by Oddthinking

- by Chris

- by GruffTech

- by Mecki

- by Bob

- by Mark Szente

- by Rakeka

- by Jim

- by sswagner

- by Mark Szente

- by EXIT_FAILURE

- by Kyle Noland

- by Clodoaldo

< Previous Page | 229 230 231 232 233 234 235 236 237 238 | Next Page >