Search Results

Search found 1620 results on 65 pages for 'proc'.

Page 62/65 | < Previous Page | 58 59 60 61 62 63 64 65  | Next Page >

  • Have I pushed the limits of my current VPS or is there room for optimization?

    - by JRameau
    I am currently on a mediatemple DV server (basic) 512mb dedicated ram, this is a CentOS based VPS with Plesk and Virtuozzo. My experience with it from day 1 has been bad and I only could sooth my server issues with several caching "Band-aids," but my sites are not as small as they were a year ago either so the issues have worsen. I have 3 Drupal installs running on separate (plesk) domains, 1 of those drupal installs is a multisite, that consists of 5-6 sites 2 of those sites are bringing in actual traffic. Those caching "Band-aids" I mentioned are APC, which seemed to help alot initially, and Drupal's Boost, which is considered a poorman's Varnish, it makes all my pages static for anonymous users. Last 30day combined estimate on Google Ananlytics: 90k visitors 260k pageviews. Issue: alot of downtime, I am continually checking if my sites are up, and lately I have been finding it down more than 3 times daily. Restarting Apache will bring it back up, for some time. I have google search every error message and looked up ways to optimize my DV server, and I am beyond stump what is my next move. Is this server bad, have I hit a impossibly low restriction such as the 12mb kernel memory barrier (kmemsize), is it on my end, do I need to optimize some more? *I have provided as much information as I can below, any help or suggestions given will be appreciated Common Error messages I see in the log: [error] (12)Cannot allocate memory: fork: Unable to fork new process [error] make_obcallback: could not import mod_python.apache.\n Traceback (most recent call last): File "/usr/lib/python2.4/site-packages/mod_python/apache.py", line 21, in ? import traceback File "/usr/lib/python2.4/traceback.py", line 3, in ? import linecache ImportError: No module named linecache [error] python_handler: no interpreter callback found. [warn-phpd] mmap cache can't open /var/www/vhosts/***/httpdocs/*** - Too many open files in system (pid ***) [alert] Child 8125 returned a Fatal error... Apache is exiting! [emerg] (43)Identifier removed: couldn't grab the accept mutex [emerg] (22)Invalid argument: couldn't release the accept mutex cat /proc/user_beancounters: Version: 2.5 uid resource held maxheld barrier limit failcnt 41548: kmemsize 4582652 5306699 12288832 13517715 21105036 lockedpages 0 0 600 600 0 privvmpages 38151 42676 229036 249036 0 shmpages 16274 16274 17237 17237 2 dummy 0 0 0 0 0 numproc 43 46 300 300 0 physpages 27260 29528 0 2147483647 0 vmguarpages 0 0 131072 2147483647 0 oomguarpages 27270 29538 131072 2147483647 0 numtcpsock 21 29 300 300 0 numflock 8 8 480 528 0 numpty 1 1 30 30 0 numsiginfo 0 1 1024 1024 0 tcpsndbuf 648440 675272 2867477 4096277 1711499 tcprcvbuf 301620 359716 2867477 4096277 0 othersockbuf 4472 4472 1433738 2662538 0 dgramrcvbuf 0 0 1433738 1433738 0 numothersock 12 12 300 300 0 dcachesize 0 0 2684271 2764800 0 numfile 3447 3496 6300 6300 3872 dummy 0 0 0 0 0 dummy 0 0 0 0 0 dummy 0 0 0 0 0 numiptent 14 14 200 200 0 TOP: (In January the load avg was really high 3-10, I was able to bring it down where it is currently is by giving APC more memory play around with) top - 16:46:07 up 2:13, 1 user, load average: 0.34, 0.20, 0.20 Tasks: 40 total, 2 running, 37 sleeping, 0 stopped, 1 zombie Cpu(s): 0.3% us, 0.1% sy, 0.0% ni, 99.7% id, 0.0% wa, 0.0% hi, 0.0% si Mem: 916144k total, 156668k used, 759476k free, 0k buffers Swap: 0k total, 0k used, 0k free, 0k cached MySQLTuner: (after optimizing every table and repairing any table with overage I got the fragmented count down to 86) [--] Data in MyISAM tables: 285M (Tables: 1105) [!!] Total fragmented tables: 86 [--] Up for: 2h 44m 38s (409K q [41.421 qps], 6K conn, TX: 1B, RX: 174M) [--] Reads / Writes: 79% / 21% [--] Total buffers: 58.0M global + 2.7M per thread (100 max threads) [!!] Query cache prunes per day: 675307 [!!] Temporary tables created on disk: 35% (7K on disk / 20K total)

    Read the article

  • GNU/Linux swapping blocks system

    - by Ole Tange
    I have used GNU/Linux on systems from 4 MB RAM to 512 GB RAM. When they start swapping, most of the time you can still log in and kill off the offending process - you just have to be 100-1000 times more patient. On my new 32 GB system that has changed: It blocks when it starts swapping. Sometimes with full disk activity but other times with no disk activity. To examine what might be the issue I have written this program. The idea is: 1 grab 3% of the memory free right now 2 if that caused swap to increase: stop 3 keep the chunk used for 30 seconds by forking off 4 goto 1 - #!/usr/bin/perl sub freekb { my $free = `free|grep buffers/cache`; my @a=split / +/,$free; return $a[3]; } sub swapkb { my $swap = `free|grep Swap:`; my @a=split / +/,$swap; return $a[2]; } my $swap = swapkb(); my $lastswap = $swap; my $free; while($lastswap >= $swap) { print "$swap $free"; $lastswap = $swap; $swap = swapkb(); $free = freekb(); my $used_mem = "x"x(1024 * $free * 0.03); if(not fork()) { sleep 30; exit(); } } print "Swap increased $swap $lastswap\n"; Running the program forever ought to keep the system at the limit of swapping, but only grabbing a minimal amount of swap and do that very slowly (i.e. a few MB at a time at most). If I run: forever free | stdbuf -o0 timestamp > freelog I ought to see swap slowly rising every second. (forever and timestamp from https://github.com/ole-tange/tangetools). But that is not the behaviour I see: I see swap increasing in jumps and that the system is completely blocked during these jumps. Here the system is blocked for 30 seconds with the swap usage increases with 1 GB: secs 169.527 Swap: 18440184 154184 18286000 170.531 Swap: 18440184 154184 18286000 200.630 Swap: 18440184 1134240 17305944 210.259 Swap: 18440184 1076228 17363956 Blocked: 21 secs. Swap increase 2400 MB: 307.773 Swap: 18440184 581324 17858860 308.799 Swap: 18440184 597676 17842508 330.103 Swap: 18440184 2503020 15937164 331.106 Swap: 18440184 2502936 15937248 Blocked: 20 secs. Swap increase 2200 MB: 751.283 Swap: 18440184 885288 17554896 752.286 Swap: 18440184 911676 17528508 772.331 Swap: 18440184 3193532 15246652 773.333 Swap: 18440184 1404540 17035644 Blocked: 37 secs. Swap increase 2400 MB: 904.068 Swap: 18440184 613108 17827076 905.072 Swap: 18440184 610368 17829816 942.424 Swap: 18440184 3014668 15425516 942.610 Swap: 18440184 2073580 16366604 This is bad enough, but what is even worse is that the system sometimes stops responding at all - even if I wait for hours. I have the feeling it is related to the swapping issue, but I cannot tell for sure. My first idea was to tweak /proc/sys/vm/swappiness from 60 to 0 or 100, just to see if that had any effect at all. 0 did not have an effect, but 100 did cause the problem to arise less often. How can I prevent the system from blocking for such a long time? Why does it decide to swapout 1-3 GB when less than 10 MB would suffice?

    Read the article

  • Improving Manageability of Virtual Environments

    - by Jeff Victor
    Boot Environments for Solaris 10 Branded Zones Until recently, Solaris 10 Branded Zones on Solaris 11 suffered one notable regression: Live Upgrade did not work. The individual packaging and patching tools work correctly, but the ability to upgrade Solaris while the production workload continued running did not exist. A recent Solaris 11 SRU (Solaris 11.1 SRU 6.4) restored most of that functionality, although with a slightly different concept, different commands, and without all of the feature details. This new method gives you the ability to create and manage multiple boot environments (BEs) for a Solaris 10 Branded Zone, and modify the active or any inactive BE, and to do so while the production workload continues to run. Background In case you are new to Solaris: Solaris includes a set of features that enables you to create a bootable Solaris image, called a Boot Environment (BE). This newly created image can be modified while the original BE is still running your workload(s). There are many benefits, including improved uptime and the ability to reboot into (or downgrade to) an older BE if a newer one has a problem. In Solaris 10 this set of features was named Live Upgrade. Solaris 11 applies the same basic concepts to the new packaging system (IPS) but there isn't a specific name for the feature set. The features are simply part of IPS. Solaris 11 Boot Environments are not discussed in this blog entry. Although a Solaris 10 system can have multiple BEs, until recently a Solaris 10 Branded Zone (BZ) in a Solaris 11 system did not have this ability. This limitation was addressed recently, and that enhancement is the subject of this blog entry. This new implementation uses two concepts. The first is the use of a ZFS clone for each BE. This makes it very easy to create a BE, or many BEs. This is a distinct advantage over the Live Upgrade feature set in Solaris 10, which had a practical limitation of two BEs on a system, when using UFS. The second new concept is a very simple mechanism to indicate the BE that should be booted: a ZFS property. The new ZFS property is named com.oracle.zones.solaris10:activebe (isn't that creative? ). It's important to note that the property is inherited from the original BE's file system to any BEs you create. In other words, all BEs in one zone have the same value for that property. When the (Solaris 11) global zone boots the Solaris 10 BZ, it boots the BE that has the name that is stored in the activebe property. Here is a quick summary of the actions you can use to manage these BEs: To create a BE: Create a ZFS clone of the zone's root dataset To activate a BE: Set the ZFS property of the root dataset to indicate the BE To add a package or patch to an inactive BE: Mount the inactive BE Add packages or patches to it Unmount the inactive BE To list the available BEs: Use the "zfs list" command. To destroy a BE: Use the "zfs destroy" command. Preparation Before you can use the new features, you will need a Solaris 10 BZ on a Solaris 11 system. You can use these three steps - on a real Solaris 11.1 server or in a VirtualBox guest running Solaris 11.1 - to create a Solaris 10 BZ. The Solaris 11.1 environment must be at SRU 6.4 or newer. Create a flash archive on the Solaris 10 system s10# flarcreate -n s10-system /net/zones/archives/s10-system.flar Configure the Solaris 10 BZ on the Solaris 11 system s11# zonecfg -z s10z Use 'create' to begin configuring a new zone. zonecfg:s10z create -t SYSsolaris10 zonecfg:s10z set zonepath=/zones/s10z zonecfg:s10z exit s11# zoneadm list -cv ID NAME STATUS PATH BRAND IP 0 global running / solaris shared - s10z configured /zones/s10z solaris10 excl Install the zone from the flash archive s11# zoneadm -z s10z install -a /net/zones/archives/s10-system.flar -p You can find more information about the migration of Solaris 10 environments to Solaris 10 Branded Zones in the documentation. The rest of this blog entry demonstrates the commands you can use to accomplish the aforementioned actions related to BEs. New features in action Note that the demonstration of the commands occurs in the Solaris 10 BZ, as indicated by the shell prompt "s10z# ". Many of these commands can be performed in the global zone instead, if you prefer. If you perform them in the global zone, you must change the ZFS file system names. Create The only complicated action is the creation of a BE. In the Solaris 10 BZ, create a new "boot environment" - a ZFS clone. You can assign any name to the final portion of the clone's name, as long as it meets the requirements for a ZFS file system name. s10z# zfs snapshot rpool/ROOT/zbe-0@snap s10z# zfs clone -o mountpoint=/ -o canmount=noauto rpool/ROOT/zbe-0@snap rpool/ROOT/newBE cannot mount 'rpool/ROOT/newBE' on '/': directory is not empty filesystem successfully created, but not mounted You can safely ignore that message: we already know that / is not empty! We have merely told ZFS that the default mountpoint for the clone is the root directory. List the available BEs and active BE Because each BE is represented by a clone of the rpool/ROOT dataset, listing the BEs is as simple as listing the clones. s10z# zfs list -r rpool/ROOT NAME USED AVAIL REFER MOUNTPOINT rpool/ROOT 3.55G 42.9G 31K legacy rpool/ROOT/zbe-0 1K 42.9G 3.55G / rpool/ROOT/newBE 3.55G 42.9G 3.55G / The output shows that two BEs exist. Their names are "zbe-0" and "newBE". You can tell Solaris that one particular BE should be used when the zone next boots by using a ZFS property. Its name is com.oracle.zones.solaris10:activebe. The value of that property is the name of the clone that contains the BE that should be booted. s10z# zfs get com.oracle.zones.solaris10:activebe rpool/ROOT NAME PROPERTY VALUE SOURCE rpool/ROOT com.oracle.zones.solaris10:activebe zbe-0 local Change the active BE When you want to change the BE that will be booted next time, you can just change the activebe property on the rpool/ROOT dataset. s10z# zfs get com.oracle.zones.solaris10:activebe rpool/ROOT NAME PROPERTY VALUE SOURCE rpool/ROOT com.oracle.zones.solaris10:activebe zbe-0 local s10z# zfs set com.oracle.zones.solaris10:activebe=newBE rpool/ROOT s10z# zfs get com.oracle.zones.solaris10:activebe rpool/ROOT NAME PROPERTY VALUE SOURCE rpool/ROOT com.oracle.zones.solaris10:activebe newBE local s10z# shutdown -y -g0 -i6 After the zone has rebooted: s10z# zfs get com.oracle.zones.solaris10:activebe rpool/ROOT rpool/ROOT com.oracle.zones.solaris10:activebe newBE local s10z# zfs mount rpool/ROOT/newBE / rpool/export /export rpool/export/home /export/home rpool /rpool Mount the original BE to see that it's still there. s10z# zfs mount -o mountpoint=/mnt rpool/ROOT/zbe-0 s10z# ls /mnt Desktop export platform Documents export.backup.20130607T214951Z proc S10Flar home rpool TT_DB kernel sbin bin lib system boot lost+found tmp cdrom mnt usr dev net var etc opt Patch an inactive BE At this point, you can modify the original BE. If you would prefer to modify the new BE, you can restore the original value to the activebe property and reboot, and then mount the new BE to /mnt (or another empty directory) and modify it. Let's mount the original BE so we can modify it. (The first command is only needed if you haven't already mounted that BE.) s10z# zfs mount -o mountpoint=/mnt rpool/ROOT/zbe-0 s10z# patchadd -R /mnt -M /var/sadm/spool 104945-02 Note that the typical usage will be: Create a BE Mount the new (inactive) BE Use the package and patch tools to update the new BE Unmount the new BE Reboot Delete an inactive BE ZFS clones are children of their parent file systems. In order to destroy the parent, you must first "promote" the child. This reverses the parent-child relationship. (For more information on this, see the documentation.) The original rpool/ROOT file system is the parent of the clones that you create as BEs. In order to destroy an earlier BE that is that parent of other BEs, you must first promote one of the child BEs to be the ZFS parent. Only then can you destroy the original BE. Fortunately, this is easier to do than to explain: s10z# zfs promote rpool/ROOT/newBE s10z# zfs destroy rpool/ROOT/zbe-0 s10z# zfs list -r rpool/ROOT NAME USED AVAIL REFER MOUNTPOINT rpool/ROOT 3.56G 269G 31K legacy rpool/ROOT/newBE 3.56G 269G 3.55G / Documentation This feature is so new, it is not yet described in the Solaris 11 documentation. However, MOS note 1558773.1 offers some details. Conclusion With this new feature, you can add and patch packages to boot environments of a Solaris 10 Branded Zone. This ability improves the manageability of these zones, and makes their use more practical. It also means that you can use the existing P2V tools with earlier Solaris 10 updates, and modify the environments after they become Solaris 10 Branded Zones.

    Read the article

  • How to restore Linode to Vagrant VM?

    - by Iain Elder
    I'm trying to set up a Linux development environment so I can safely make changes to my website without breaking the live site. Linode hosts my live site. A simple solution would be to host my development server on Linode as well, but I want to avoid doubling my hosting costs. The cheapest way I see is to use Vagrant on my Windows workstation to host my development environment. After I attempt to restore the backup to Vagrant and reboot the VM, I can no longer ssh into the Vagrant host. It's probably because by restoring the backup I overwrite some special Vagrant configuration, but I'm not sure how to avoid that. How do I make this approach work? If my approach is fundamentally wrong, can you suggest an alternative? Creating the backup On the Linode I used these commands to create a compressed copy of the entire filesystem, while ignoring things that shouldn't be included in the backup: $ sudo rsync -ahvz --exclude={/dev/*,/proc/*,/sys/*,/tmp/*,/run/*,/mnt/*,/backup/*} /* /backup/2 $ sudo tar -czf /backup/2.gz /backup/2 The backup file is called 2.gz because this is thesecond backup. The first backup is called 1.gz. I use WinSCP to copy the backup file to my Windows workstation. Setting up the Vagrant host I need a Vagrant box that matches my Linode operating system (Ubuntu 12.04.3 LTS, kernel 3.9.3). I selected the closet match from vagrantbox.es: Ubuntu Server Precise 12.04.3 amd64 Kernel is ready for Docker (Docker not included) On my workstation I ran these commands to add the box and initialize and boot an instance: $ vagrant box add ubuntu-precise http://nitron-vagrant.s3-website-us-east-1.amazonaws.com/vagrant_ubuntu_12.04.3_amd64_virtualbox.box $ mkdir linode-test $ cd linode-test $ vagrant init ubuntu-precise $ vagrant up Now Vagrant is running a machine with SSH on port 2222. The operating system version is the same. The kernel version is 3.8.0. Sounds close enough. Restoring the backup With WinSCP I copied the backup file 2.gz to /home/vagrant/2.gz on the Vagrant box. With PuTTY I connected via ssh to my new Vagrant box: On the box move the backup to the filesystem root. $ sudo mv 2.gz / Extract the archive to the filesystem root: $ sudo tar -xvpz -f 2.gz -C / --strip-components=2 (I discovered I need to use strip components because all files in the archive have the prefix backup/2/. I'll fix this for the next backup.) After the tar command completes, I log out of the box. Testing the backup When I try to log in again, it doesn't let me log in as vagrant with a password any more. It does let me log in as iain, my user on the live Linode, with a password. That surprised me because I disabled password authentication on my live Linode. I figured that I have to restart the ssh service for the change to take effect. Instead of restarting just ssh, I chose to restart the whole system. Now I can't even get to the login screen. PuTTY says "connection refused" when I try to connect. What went wrong?

    Read the article

  • rsyslogd not monitoring all files

    - by Tom O'Connor
    So.. I've installed Logstash, and instead of using the logstash shipper (because it needs the JVM and is generally massive), I'm using rsyslogd with the following configuration. # Use traditional timestamp format $ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat $IncludeConfig /etc/rsyslog.d/*.conf # Provides kernel logging support (previously done by rklogd) $ModLoad imklog # Provides support for local system logging (e.g. via logger command) $ModLoad imuxsock # Log all kernel messages to the console. # Logging much else clutters up the screen. #kern.* /dev/console # Log anything (except mail) of level info or higher. # Don't log private authentication messages! *.info;mail.none;authpriv.none;cron.none;local6.none /var/log/messages # The authpriv file has restricted access. authpriv.* /var/log/secure # Log all the mail messages in one place. mail.* -/var/log/maillog # Log cron stuff cron.* /var/log/cron # Everybody gets emergency messages *.emerg * # Save news errors of level crit and higher in a special file. uucp,news.crit /var/log/spooler # Save boot messages also to boot.log local7.* /var/log/boot.log In /etc/rsyslog.d/logstash.conf there are 28 file monitor blocks using imfile $ModLoad imfile # Load the imfile input module $ModLoad imklog # for reading kernel log messages $ModLoad imuxsock # for reading local syslog messages $InputFileName /var/log/rabbitmq/startup_err $InputFileTag rmq-err: $InputFileStateFile state-rmq-err $InputFileFacility local6 $InputRunFileMonitor .... $InputFileName /var/log/some.other.custom.log $InputFileTag cust-log: $InputFileStateFile state-cust-log $InputFileFacility local6 $InputRunFileMonitor .... *.* @@10.90.0.110:5514 There are 28 InputFileMonitor blocks, each monitoring a different custom application logfile.. If I run [root@secret-gm02 ~]# lsof|grep rsyslog rsyslogd 5380 root cwd DIR 253,0 4096 2 / rsyslogd 5380 root rtd DIR 253,0 4096 2 / rsyslogd 5380 root txt REG 253,0 278976 1015955 /sbin/rsyslogd rsyslogd 5380 root mem REG 253,0 58400 1868123 /lib64/libgcc_s-4.1.2-20080825.so.1 rsyslogd 5380 root mem REG 253,0 144776 1867778 /lib64/ld-2.5.so rsyslogd 5380 root mem REG 253,0 1718232 1867780 /lib64/libc-2.5.so rsyslogd 5380 root mem REG 253,0 23360 1867787 /lib64/libdl-2.5.so rsyslogd 5380 root mem REG 253,0 145872 1867797 /lib64/libpthread-2.5.so rsyslogd 5380 root mem REG 253,0 85544 1867815 /lib64/libz.so.1.2.3 rsyslogd 5380 root mem REG 253,0 53448 1867801 /lib64/librt-2.5.so rsyslogd 5380 root mem REG 253,0 92816 1868016 /lib64/libresolv-2.5.so rsyslogd 5380 root mem REG 253,0 20384 1867990 /lib64/rsyslog/lmnsd_ptcp.so rsyslogd 5380 root mem REG 253,0 53880 1867802 /lib64/libnss_files-2.5.so rsyslogd 5380 root mem REG 253,0 23736 1867800 /lib64/libnss_dns-2.5.so rsyslogd 5380 root mem REG 253,0 20768 1867988 /lib64/rsyslog/lmnet.so rsyslogd 5380 root mem REG 253,0 11488 1867982 /lib64/rsyslog/imfile.so rsyslogd 5380 root mem REG 253,0 24040 1867983 /lib64/rsyslog/imklog.so rsyslogd 5380 root mem REG 253,0 11536 1867987 /lib64/rsyslog/imuxsock.so rsyslogd 5380 root mem REG 253,0 13152 1867989 /lib64/rsyslog/lmnetstrms.so rsyslogd 5380 root mem REG 253,0 8400 1867992 /lib64/rsyslog/lmtcpclt.so rsyslogd 5380 root 0r REG 0,3 0 4026531848 /proc/kmsg rsyslogd 5380 root 1u IPv4 1200589517 0t0 TCP 10.10.10.90 t:40629->10.10.10.90:5514 (ESTABLISHED) rsyslogd 5380 root 2u IPv4 1200589527 0t0 UDP *:45801 rsyslogd 5380 root 3w REG 253,3 17999744 2621483 /var/log/messages rsyslogd 5380 root 4w REG 253,3 13383 2621484 /var/log/secure rsyslogd 5380 root 5w REG 253,3 7180 2621493 /var/log/maillog rsyslogd 5380 root 6w REG 253,3 43321 2621529 /var/log/cron rsyslogd 5380 root 7w REG 253,3 0 2621494 /var/log/spooler rsyslogd 5380 root 8w REG 253,3 0 2621495 /var/log/boot.log rsyslogd 5380 root 9r REG 253,3 1064271998 2621464 /var/log/custom-application.monolog.log rsyslogd 5380 root 10u unix 0xffff81081fad2e40 0t0 1200589511 /dev/log You can see that there are nowhere near 28 logfiles actually being read. I really had to get one file monitored, so I moved it to the top, and it picked it up, but I'd like to be able to monitor all 28+ files, and not have to worry. OS is Centos 5.5 Kernel 2.6.18-308.el5 rsyslogd 3.22.1, compiled with: FEATURE_REGEXP: Yes FEATURE_LARGEFILE: Yes FEATURE_NETZIP (message compression): Yes GSSAPI Kerberos 5 support: Yes FEATURE_DEBUG (debug build, slow code): No Atomic operations supported: Yes Runtime Instrumentation (slow code): No Questions: Why is rsyslogd only monitoring a very small subset of the files? How can I fix this so that all the files are monitored?

    Read the article

  • http request via iptables --to-destination ip redirect results in no response

    - by Wouter Vegter
    I have two Ubuntu servers with each having their own ip addresses. Let's call them server1 and server2, having respectively ip 1.1.1.1 and 2.2.2.2 I have a nginx running on server2. The sole purpose I want server1 to have is to redirect all incoming http (so port 80) requests to server2 without clients noticing that their request is being redirected. I tried the following command on server1: iptables -t nat -A PREROUTING -p tcp --dport 80 -j DNAT --to-destination 2.2.2.2 But when I enter 1.1.1.1 in my browser I get no respond: the page keeps trying to load without giving any message or error message (I get a time-out after 2-3 mins). But when I do remove the above iptables rule I immediately do get a "page not found error" when I enter 1.1.1.1 in my browser; so something is working but not as it should: when I enter 1.1.1.1 I want the html page to load that is hosted on 2.2.2.2 Because when i enter 2.2.2.2 in my browser I do see the webpage loaded. Could anyone please help me with this? I am searching quite some time (on severfault & Google) on this now so that's why I ask. Many thanks for reading my question! Update: Thank you all for you information. Unfortunately I still get no response I have the following iptables configuration: root@ip-10-48-238-216:/home/ubuntu# sudo iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination root@ip-10-48-238-216:/home/ubuntu# sudo iptables -t nat -L Chain PREROUTING (policy ACCEPT) target prot opt source destination DNAT tcp -- anywhere anywhere tcp dpt:www to:2.2.2.2 Chain OUTPUT (policy ACCEPT) target prot opt source destination Chain POSTROUTING (policy ACCEPT) target prot opt source destination When i run tcpdump and do request via chrome to 1.1.1.1 i get the following root@ip-10-48-238-216:/home/ubuntu# sudo tcpdump -i eth0 port 80 -vv tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes 13:56:18.346625 IP (tos 0x0, ttl 52, id 12055, offset 0, flags [DF], proto TCP (6), length 60) 212-123-161-112.ip.telfort.nl.16386 ip-10-48-238-216.eu-west-1.compute.internal.www: Flags [S], cksum 0xb398 (correct), seq 2639758575, win 5840, options [mss 1460,sackOK,TS val 1223672 ecr 0,nop,wscale 6], length 0 13:56:18.346662 IP (tos 0x0, ttl 51, id 12055, offset 0, flags [DF], proto TCP (6), length 60) 212-123-161-112.ip.telfort.nl.16386 ww1dc1.shopreme.com.www: Flags [S], cksum 0x9ee0 (correct), seq 2639758575, win 5840, options [mss 1460,sackOK,TS val 1223672 ecr 0,nop,wscale 6], length 0 13:56:18.598747 IP (tos 0x0, ttl 52, id 10138, offset 0, flags [DF], proto TCP (6), length 60) 212-123-161-112.ip.telfort.nl.16387 ip-10-48-238-216.eu-west-1.compute.internal.www: Flags [S], cksum 0xac40 (correct), seq 2645658541, win 5840, options [mss 1460,sackOK,TS val 1223735 ecr 0,nop,wscale 6], length 0 13:56:18.598777 IP (tos 0x0, ttl 51, id 10138, offset 0, flags [DF], proto TCP (6), length 60) 212-123-161-112.ip.telfort.nl.16387 ww1dc1.shopreme.com.www: Flags [S], cksum 0x9788 (correct), seq 2645658541, win 5840, options [mss 1460,sackOK,TS val 1223735 ecr 0,nop,wscale 6], length 0 ^C 4 packets captured 4 packets received by filter 0 packets dropped by kernel the mentioned address relate to the following 212-123-161-112.ip.telfort.nl.16386 : my personal computer ww1dc1.shopreme.com.www : dns of server2 (2.2.2.2) ip-10-48-238-216.eu-west-1.compute.internal.www : amazon web services ec2 internal address of server1 (1.1.1.1) However, the tcpdump log on server2 (2.2.2.2) stays empty and I get no response back in my browser. I am able to ping from server1 to server2. And net.ipv4.ip_forward is set to 1 and so is /proc/sys/net/ipv4/ip_forward Could there be anything else that is missing?

    Read the article

  • Degraded RAID5 and no md superblock on one of remaining drive

    - by ark1214
    This is actually on a QNAP TS-509 NAS. The RAID is basically a Linux RAID. The NAS was configured with RAID 5 with 5 drives (/md0 with /dev/sd[abcde]3). At some point, /dev/sde failed and drive was replaced. While rebuilding (and not completed), the NAS rebooted itself and /dev/sdc dropped out of the array. Now the array can't start because essentially 2 drives have dropped out. I disconnected /dev/sde and hoped that /md0 can resume in degraded mode, but no luck.. Further investigation shows that /dev/sdc3 has no md superblock. The data should be good since the array was unable to assemble after /dev/sdc dropped off. All the searches I done showed how to reassemble the array assuming 1 bad drive. But I think I just need to restore the superblock on /dev/sdc3 and that should bring the array up to a degraded mode which will allow me to backup data and then proceed with rebuilding with adding /dev/sde. Any help would be greatly appreciated. mdstat does not show /dev/md0 # cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] md5 : active raid1 sdd2[2](S) sdc2[3](S) sdb2[1] sda2[0] 530048 blocks [2/2] [UU] md13 : active raid1 sdd4[3] sdc4[2] sdb4[1] sda4[0] 458880 blocks [5/4] [UUUU_] bitmap: 40/57 pages [160KB], 4KB chunk md9 : active raid1 sdd1[3] sdc1[2] sdb1[1] sda1[0] 530048 blocks [5/4] [UUUU_] bitmap: 33/65 pages [132KB], 4KB chunk mdadm show /dev/md0 is still there # mdadm --examine --scan ARRAY /dev/md9 level=raid1 num-devices=5 UUID=271bf0f7:faf1f2c2:967631a4:3c0fa888 ARRAY /dev/md5 level=raid1 num-devices=2 UUID=0d75de26:0759d153:5524b8ea:86a3ee0d spares=2 ARRAY /dev/md0 level=raid5 num-devices=5 UUID=ce3e369b:4ff9ddd2:3639798a:e3889841 ARRAY /dev/md13 level=raid1 num-devices=5 UUID=7384c159:ea48a152:a1cdc8f2:c8d79a9c With /dev/sde removed, here is the mdadm examine output showing sdc3 has no md superblock # mdadm --examine /dev/sda3 /dev/sda3: Magic : a92b4efc Version : 00.90.00 UUID : ce3e369b:4ff9ddd2:3639798a:e3889841 Creation Time : Sat Dec 8 15:01:19 2012 Raid Level : raid5 Used Dev Size : 1463569600 (1395.77 GiB 1498.70 GB) Array Size : 5854278400 (5583.08 GiB 5994.78 GB) Raid Devices : 5 Total Devices : 4 Preferred Minor : 0 Update Time : Sat Dec 8 15:06:17 2012 State : active Active Devices : 4 Working Devices : 4 Failed Devices : 1 Spare Devices : 0 Checksum : d9e9ff0e - correct Events : 0.394 Layout : left-symmetric Chunk Size : 64K Number Major Minor RaidDevice State this 0 8 3 0 active sync /dev/sda3 0 0 8 3 0 active sync /dev/sda3 1 1 8 19 1 active sync /dev/sdb3 2 2 8 35 2 active sync /dev/sdc3 3 3 8 51 3 active sync /dev/sdd3 4 4 0 0 4 faulty removed [~] # mdadm --examine /dev/sdb3 /dev/sdb3: Magic : a92b4efc Version : 00.90.00 UUID : ce3e369b:4ff9ddd2:3639798a:e3889841 Creation Time : Sat Dec 8 15:01:19 2012 Raid Level : raid5 Used Dev Size : 1463569600 (1395.77 GiB 1498.70 GB) Array Size : 5854278400 (5583.08 GiB 5994.78 GB) Raid Devices : 5 Total Devices : 4 Preferred Minor : 0 Update Time : Sat Dec 8 15:06:17 2012 State : active Active Devices : 4 Working Devices : 4 Failed Devices : 1 Spare Devices : 0 Checksum : d9e9ff20 - correct Events : 0.394 Layout : left-symmetric Chunk Size : 64K Number Major Minor RaidDevice State this 1 8 19 1 active sync /dev/sdb3 0 0 8 3 0 active sync /dev/sda3 1 1 8 19 1 active sync /dev/sdb3 2 2 8 35 2 active sync /dev/sdc3 3 3 8 51 3 active sync /dev/sdd3 4 4 0 0 4 faulty removed [~] # mdadm --examine /dev/sdc3 mdadm: No md superblock detected on /dev/sdc3. [~] # mdadm --examine /dev/sdd3 /dev/sdd3: Magic : a92b4efc Version : 00.90.00 UUID : ce3e369b:4ff9ddd2:3639798a:e3889841 Creation Time : Sat Dec 8 15:01:19 2012 Raid Level : raid5 Used Dev Size : 1463569600 (1395.77 GiB 1498.70 GB) Array Size : 5854278400 (5583.08 GiB 5994.78 GB) Raid Devices : 5 Total Devices : 4 Preferred Minor : 0 Update Time : Sat Dec 8 15:06:17 2012 State : active Active Devices : 4 Working Devices : 4 Failed Devices : 1 Spare Devices : 0 Checksum : d9e9ff44 - correct Events : 0.394 Layout : left-symmetric Chunk Size : 64K Number Major Minor RaidDevice State this 3 8 51 3 active sync /dev/sdd3 0 0 8 3 0 active sync /dev/sda3 1 1 8 19 1 active sync /dev/sdb3 2 2 8 35 2 active sync /dev/sdc3 3 3 8 51 3 active sync /dev/sdd3 4 4 0 0 4 faulty removed fdisk output shows /dev/sdc3 partition is still there. [~] # fdisk -l Disk /dev/sdx: 128 MB, 128057344 bytes 8 heads, 32 sectors/track, 977 cylinders Units = cylinders of 256 * 512 = 131072 bytes Device Boot Start End Blocks Id System /dev/sdx1 1 8 1008 83 Linux /dev/sdx2 9 440 55296 83 Linux /dev/sdx3 441 872 55296 83 Linux /dev/sdx4 873 977 13440 5 Extended /dev/sdx5 873 913 5232 83 Linux /dev/sdx6 914 977 8176 83 Linux Disk /dev/sda: 1500.3 GB, 1500301910016 bytes 255 heads, 63 sectors/track, 182401 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 * 1 66 530113+ 83 Linux /dev/sda2 67 132 530145 82 Linux swap / Solaris /dev/sda3 133 182338 1463569695 83 Linux /dev/sda4 182339 182400 498015 83 Linux Disk /dev/sda4: 469 MB, 469893120 bytes 2 heads, 4 sectors/track, 114720 cylinders Units = cylinders of 8 * 512 = 4096 bytes Disk /dev/sda4 doesn't contain a valid partition table Disk /dev/sdb: 1500.3 GB, 1500301910016 bytes 255 heads, 63 sectors/track, 182401 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdb1 * 1 66 530113+ 83 Linux /dev/sdb2 67 132 530145 82 Linux swap / Solaris /dev/sdb3 133 182338 1463569695 83 Linux /dev/sdb4 182339 182400 498015 83 Linux Disk /dev/sdc: 1500.3 GB, 1500301910016 bytes 255 heads, 63 sectors/track, 182401 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdc1 1 66 530125 83 Linux /dev/sdc2 67 132 530142 83 Linux /dev/sdc3 133 182338 1463569693 83 Linux /dev/sdc4 182339 182400 498012 83 Linux Disk /dev/sdd: 2000.3 GB, 2000398934016 bytes 255 heads, 63 sectors/track, 243201 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdd1 1 66 530125 83 Linux /dev/sdd2 67 132 530142 83 Linux /dev/sdd3 133 243138 1951945693 83 Linux /dev/sdd4 243139 243200 498012 83 Linux Disk /dev/md9: 542 MB, 542769152 bytes 2 heads, 4 sectors/track, 132512 cylinders Units = cylinders of 8 * 512 = 4096 bytes Disk /dev/md9 doesn't contain a valid partition table Disk /dev/md5: 542 MB, 542769152 bytes 2 heads, 4 sectors/track, 132512 cylinders Units = cylinders of 8 * 512 = 4096 bytes Disk /dev/md5 doesn't contain a valid partition table

    Read the article

  • Merge replication stopping without errors in SQL 2008 R2

    - by Rob Farley
    A non-SQL MVP friend of mine, who also happens to be a client, asked me for some help again last week. I was planning on writing this up even before Rob Volk (@sql_r) listed his T-SQL Tuesday topic for this month. Earlier in the year, I (well, LobsterPot Solutions, although I’d been the person mostly involved) had helped out with a merge replication problem. The Merge Agent on the subscriber was just stopping every time, shortly after it started. With no errors anywhere – not in the Windows Event Log, the SQL Agent logs, not anywhere. We’d managed to get the system working again, but didn’t have a good reason about what had happened, and last week, the problem occurred again. I asked him about writing up the experience in a blog post, largely because of the red herrings that we encountered. It was an interesting experience for me, also because I didn’t end up touching my computer the whole time – just tapping on my phone via Twitter and Live Msgr. You see, the thing with replication is that a useful troubleshooting option is to reinitialise the thing. We’d done that last time, and it had started to work again – eventually. I say eventually, because the link being used between the sites is relatively slow, and it took a long while for the initialisation to finish. Meanwhile, we’d been doing some investigation into what the problem could be, and were suitably pleased when the problem disappeared. So I got a message saying that a replication problem had occurred again. Reinitialising wasn’t going to be an option this time either. In this scenario, the subscriber having the problem happened to be in a different domain to the publisher. The other subscribers (within the domain) were fine, just this one in a different domain had the problem. Part of the problem seemed to be a log file that wasn’t being backed up properly. They’d been trying to back up to a backup device that had a corruption, and the log file was growing. Turned out, this wasn’t related to the problem, but of course, any time you’re troubleshooting and you see something untoward, you wonder. Having got past that problem, my next thought was that perhaps there was a problem with the account being used. But the other subscribers were using the same account, without any problems. The client pointed out that that it was almost exactly six months since the last failure (later shown to be a complete red herring). It sounded like something might’ve expired. Checking through certificates and trusts showed no sign of anything, and besides, there wasn’t a problem running a command-prompt window using the account in question, from the subscriber box. ...except that when he ran the sqlcmd –E –S servername command I recommended, it failed with a Named Pipes error. I’ve seen problems with firewalls rejecting connections via Named Pipes but letting TCP/IP through, so I got him to look into SQL Configuration Manager to see what kind of connection was being preferred... Everything seemed fine. And strangely, he could connect via Management Studio. Turned out, he had a typo in the servername of the sqlcmd command. That particular red herring must’ve been reflected in his cheeks as he told me. During the time, I also pinged a friend of mine to find out who I should ask, and Ted Kruger (@onpnt) ‘s name came up. Ted (and thanks again, Ted – really) reconfirmed some of my thoughts around the idea of an account expiring, and also suggesting bumping up the logging to level 4 (2 is Verbose, 4 is undocumented ridiculousness). I’d just told the client to push the logging up to level 2, but the log file wasn’t appearing. Checking permissions showed that the user did have permission on the folder, but still no file was appearing. Then it was noticed that the user had been switched earlier as part of the troubleshooting, and switching it back to the real user caused the log file to appear. Still no errors. A lot more information being pushed out, but still no errors. Ted suggested making sure the FQDNs were okay from both ends, in case the servers were unable to talk to each other. DNS problems can lead to hassles which can stop replication from working. No luck there either – it was all working fine. Another server started to report a problem as well. These two boxes were both SQL 2008 R2 (SP1), while the others, still working, were SQL 2005. Around this time, the client tried an idea that I’d shown him a few years ago – using a Profiler trace to see what was being called on the servers. It turned out that the last call being made on the publisher was sp_MSenumschemachange. A quick interwebs search on that showed a problem that exists in SQL Server 2008 R2, when stored procedures have more than 4000 characters. Running that stored procedure (with the same parameters) manually on SQL 2005 listed three stored procedures, the first of which did indeed have more than 4000 characters. Still no error though, and the problem as listed at http://support.microsoft.com/kb/2539378 describes an error that should occur in the Event log. However, this problem is the type of thing that is fixed by a reinitialisation (because it doesn’t need to send the procedure change across as a transaction). And a look in the change history of the long stored procs (you all keep them, right?), showed that the problem from six months earlier could well have been down to this too. Applying SP2 (with sufficient paranoia about backups and how to get back out again if necessary) fixed the problem. The stored proc changes went through immediately after the service pack was applied, and it’s been running happily since. The funny thing is that I didn’t solve the problem. He had put the Profiler trace on the server, and had done the search that found a forum post pointing at this particular problem. I’d asked Ted too, and although he’d given some useful information, nothing that he’d come up with had actually been the solution either. Sometimes, asking for help is the most useful thing you can do. Often though, you don’t end up getting the help from the person you asked – the sounding board is actually what you need. @rob_farley

    Read the article

  • ls hangs for a certain directory

    - by Jakobud
    There is a particular directory (/var/www), that when I run ls (with or without some options), the command hangs and never completes. There is only about 10-15 files and directories in /var/www. Mostly just text files. Here is some investigative info: [me@server www]$ df . Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_dev-lv_root 50G 19G 29G 40% / [me@server www]$ df -i . Filesystem Inodes IUsed IFree IUse% Mounted on /dev/mapper/vg_dev-lv_root 3.2M 435K 2.8M 14% / find works fine. Also I can type in cd /var/www/ and press TAB before pressing enter and it will successfully tab-completion list of all files/directories in there: [me@server www]$ cd /var/www/ cgi-bin/ create_vhost.sh html/ manual/ phpMyAdmin/ scripts/ usage/ conf/ error/ icons/ mediawiki/ rackspace sqlbuddy/ vhosts/ [me@server www]$ cd /var/www/ I have had to kill my terminal sessions several times because of the ls hanging: [me@server ~]$ ps | grep ls gdm 6215 0.0 0.0 488152 2488 ? S<sl Jan18 0:00 /usr/bin/pulseaudio --start --log-target=syslog root 23269 0.0 0.0 117724 1088 ? D 18:24 0:00 ls -Fh --color=always -l root 23477 0.0 0.0 117724 1088 ? D 18:34 0:00 ls -Fh --color=always -l root 23579 0.0 0.0 115592 820 ? D 18:36 0:00 ls -Fh --color=always root 23634 0.0 0.0 115592 816 ? D 18:38 0:00 ls -Fh --color=always root 23740 0.0 0.0 117724 1088 ? D 18:40 0:00 ls -Fh --color=always -l me 23770 0.0 0.0 103156 816 pts/6 S+ 18:41 0:00 grep ls kill doesn't seem to have any affect on the processes, even as sudo. What else should I do to investigate this problem? It just randomly started happening today. UPDATE dmesg is a big list of things, mostly related to an external USB HDD that I've mounted too many times and the max mount count has been reached, but that is an un-related problem I think. Near the bottom of dmesg I'm seeing this: INFO: task ls:23579 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ls D ffff88041fc230c0 0 23579 23505 0x00000080 ffff8801688a1bb8 0000000000000086 0000000000000000 ffffffff8119d279 ffff880406d0ea20 ffff88007e2c2268 ffff880071fe80c8 00000003ae82967a ffff880407169ad8 ffff8801688a1fd8 0000000000010518 ffff880407169ad8 Call Trace: [<ffffffff8119d279>] ? __find_get_block+0xa9/0x200 [<ffffffff814c97ae>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff814c964b>] mutex_lock+0x2b/0x50 [<ffffffff8117a4d3>] do_lookup+0xd3/0x220 [<ffffffff8117b145>] __link_path_walk+0x6f5/0x1040 [<ffffffff8117a47d>] ? do_lookup+0x7d/0x220 [<ffffffff8117bd1a>] path_walk+0x6a/0xe0 [<ffffffff8117beeb>] do_path_lookup+0x5b/0xa0 [<ffffffff8117cb57>] user_path_at+0x57/0xa0 [<ffffffff81178986>] ? generic_readlink+0x76/0xc0 [<ffffffff8117cb62>] ? user_path_at+0x62/0xa0 [<ffffffff81171d3c>] vfs_fstatat+0x3c/0x80 [<ffffffff81258ae5>] ? _atomic_dec_and_lock+0x55/0x80 [<ffffffff81171eab>] vfs_stat+0x1b/0x20 [<ffffffff81171ed4>] sys_newstat+0x24/0x50 [<ffffffff810d40a2>] ? audit_syscall_entry+0x272/0x2a0 [<ffffffff81013172>] system_call_fastpath+0x16/0x1b And also, strace ls /var/www/ spits out a whole BUNCH of information. I don't know what is useful here... The last handful of lines: ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0 ioctl(1, TIOCGWINSZ, {ws_row=68, ws_col=145, ws_xpixel=0, ws_ypixel=0}) = 0 stat("/var/www/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 open("/var/www/", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3 fcntl(3, F_GETFD) = 0x1 (flags FD_CLOEXEC) getdents(3, /* 16 entries */, 32768) = 488 getdents(3, /* 0 entries */, 32768) = 0 close(3) = 0 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 9), ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f3093b18000 write(1, "cgi-bin conf create_vhost.sh\te"..., 125cgi-bin conf create_vhost.sh error html icons manual mediawiki phpMyAdmin rackspace scripts sqlbuddy usage vhosts ) = 125 close(1) = 0 munmap(0x7f3093b18000, 4096) = 0 close(2) = 0 exit_group(0) = ?

    Read the article

  • Neighbour table overflow on Linux hosts related to bridging and ipv6

    - by tim
    Note: I already have a workaround for this problem (as described below) so this is only a "want-to-know" question. I have a productive setup with around 50 hosts including blades running xen 4 and equallogics providing iscsi. All xen dom0s are almost plain Debian 5. The setup includes several bridges on every dom0 to support xen bridged networking. In total there are between 5 and 12 bridges on each dom0 servicing one vlan each. None of the hosts has routing enabled. At one point in time we moved one of the machines to a new hardware including a raid controller and so we installed an upstream 3.0.22/x86_64 kernel with xen patches. All other machines run debian xen-dom0-kernel. Since then we noticed on all hosts in the setup the following errors every ~2 minutes: [55888.881994] __ratelimit: 908 callbacks suppressed [55888.882221] Neighbour table overflow. [55888.882476] Neighbour table overflow. [55888.882732] Neighbour table overflow. [55888.883050] Neighbour table overflow. [55888.883307] Neighbour table overflow. [55888.883562] Neighbour table overflow. [55888.883859] Neighbour table overflow. [55888.884118] Neighbour table overflow. [55888.884373] Neighbour table overflow. [55888.884666] Neighbour table overflow. The arp table (arp -n) never showed more than around 20 entries on every machine. We tried the obvious tweaks and raised the /proc/sys/net/ipv4/neigh/default/gc_thresh* values. FInally to 16384 entries but no effect. Not even the interval of ~2 minutes changed which lead me to the conclusion that this is totally unrelated. tcpdump showed no uncommon ipv4 traffic on any interface. The only interesting finding from tcpdump were ipv6 packets bursting in like: 14:33:13.137668 IP6 fe80::216:3eff:fe1d:9d01 > ff02::1:ff1d:9d01: HBH ICMP6, multicast listener reportmax resp delay: 0 addr: ff02::1:ff1d:9d01, length 24 14:33:13.138061 IP6 fe80::216:3eff:fe1d:a8c1 > ff02::1:ff1d:a8c1: HBH ICMP6, multicast listener reportmax resp delay: 0 addr: ff02::1:ff1d:a8c1, length 24 14:33:13.138619 IP6 fe80::216:3eff:fe1d:bf81 > ff02::1:ff1d:bf81: HBH ICMP6, multicast listener reportmax resp delay: 0 addr: ff02::1:ff1d:bf81, length 24 14:33:13.138974 IP6 fe80::216:3eff:fe1d:eb41 > ff02::1:ff1d:eb41: HBH ICMP6, multicast listener reportmax resp delay: 0 addr: ff02::1:ff1d:eb41, length 24 which placed the idea in my mind that the problem maybe related to ipv6, since we have no ipv6 services in this setup. The only other hint was the coincidence of the host upgrade with the beginning of the problems. I powered down the host in question and the errors were gone. Then I subsequently took down the bridges on the host and when i took down (ifconfig down) one particularly bridge: br-vlan2159 Link encap:Ethernet HWaddr 00:26:b9:fb:16:2c inet6 addr: fe80::226:b9ff:fefb:162c/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:120 errors:0 dropped:0 overruns:0 frame:0 TX packets:9 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:5286 (5.1 KiB) TX bytes:726 (726.0 B) eth0.2159 Link encap:Ethernet HWaddr 00:26:b9:fb:16:2c inet6 addr: fe80::226:b9ff:fefb:162c/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1801 errors:0 dropped:0 overruns:0 frame:0 TX packets:20 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:126228 (123.2 KiB) TX bytes:1464 (1.4 KiB) bridge name bridge id STP enabled interfaces ... br-vlan2158 8000.0026b9fb162c no eth0.2158 br-vlan2159 8000.0026b9fb162c no eth0.2159 The errors went away again. As you can see the bridge holds no ipv4 address and it's only member is eth0.2159 so no traffic should cross it. Bridge and interface .2159 / .2157 / .2158 which are in all aspects identical apart from the vlan they are connected to had no effect when taken down. Now I disabled ipv6 on the entire host via sysctl net.ipv6.conf.all.disable_ipv6 and rebooted. After this even with bridge br-vlan2159 enabled no errors occur. Any ideas are welcome.

    Read the article

  • Raid 1 array won't assemble after power outage. How do I fix this ext4 mirror?

    - by Forkrul Assail
    Two ext4 drives on Raid 1 with mdadm won't reassemble after the power went out for an extended period (UPS drained). After turning the machine back on, mdadm said that the array was degraded, after which it took about 2 days for a full resync, which completed without problems. On trying to remount the array I get: mount: you must specify the filesystem type cat /etc/fstab lines relevant to setup: /dev/md127 /media/mediapool ext4 defaults 0 0 dmesg | tail (on trying to mount) says: [ 1050.818782] EXT3-fs (md127): error: can't find ext3 filesystem on dev md127. [ 1050.849214] EXT4-fs (md127): VFS: Can't find ext4 filesystem [ 1050.944781] FAT-fs (md127): invalid media value (0x00) [ 1050.944782] FAT-fs (md127): Can't find a valid FAT filesystem [ 1058.272787] EXT2-fs (md127): error: can't find an ext2 filesystem on dev md127. cat /proc/mdstat says: Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] md127 : active (auto-read-only) raid1 sdj[2] sdi[0] 2930135360 blocks super 1.2 [2/2] [UU] unused devices: <none> fsck /dev/md127 says: fsck from util-linux 2.20.1 e2fsck 1.42 (29-Nov-2011) fsck.ext2: Superblock invalid, trying backup blocks... fsck.ext2: Bad magic number in super-block while trying to open /dev/md127 The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193 <device> mdadm -E /dev/sdi gives me: /dev/sdi: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 37ac1824:eb8a21f6:bd5afd6d:96da6394 Name : sojourn:33 Creation Time : Sat Nov 10 10:43:52 2012 Raid Level : raid1 Raid Devices : 2 Avail Dev Size : 5860271016 (2794.40 GiB 3000.46 GB) Array Size : 2930135360 (2794.39 GiB 3000.46 GB) Used Dev Size : 5860270720 (2794.39 GiB 3000.46 GB) Data Offset : 262144 sectors Super Offset : 8 sectors State : clean Device UUID : 3e6e9a4f:6c07ab3d:22d47fce:13cecfd0 Update Time : Tue Nov 13 20:34:18 2012 Checksum : f7d10db9 - correct Events : 27 Device Role : Active device 0 Array State : AA ('A' == active, '.' == missing) boot@boot ~ $ sudo mdadm -E /dev/sdj /dev/sdj: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 37ac1824:eb8a21f6:bd5afd6d:96da6394 Name : sojourn:33 Creation Time : Sat Nov 10 10:43:52 2012 Raid Level : raid1 Raid Devices : 2 Avail Dev Size : 5860271016 (2794.40 GiB 3000.46 GB) Array Size : 2930135360 (2794.39 GiB 3000.46 GB) Used Dev Size : 5860270720 (2794.39 GiB 3000.46 GB) Data Offset : 262144 sectors Super Offset : 8 sectors State : clean Device UUID : 7fb84af4:e9295f7b:ede61f27:bec0cb57 Update Time : Tue Nov 13 20:34:18 2012 Checksum : b9d17fef - correct Events : 27 Device Role : Active device 1 Array State : AA ('A' == active, '.' == missing) machine@user ~ dmesg | tail [ 61.785866] init: alsa-restore main process (2736) terminated with status 99 [ 68.433548] eth0: no IPv6 routers present [ 534.142511] EXT4-fs (sdi): ext4_check_descriptors: Block bitmap for group 0 not in group (block 2838187772)! [ 534.142518] EXT4-fs (sdi): group descriptors corrupted! [ 546.418780] EXT2-fs (sdi): error: couldn't mount because of unsupported optional features (240) [ 549.654127] EXT3-fs (sdi): error: couldn't mount because of unsupported optional features (240) Since this is Raid 1 it was suggested that I try and mount or fsck the drives separately. After a long fsck on one drive, it ended with this as tail: Illegal double indirect block (2298566437) in inode 39717736. CLEARED. Illegal block #4231180 (2611866932) in inode 39717736. CLEARED. Error storing directory block information (inode=39717736, block=0, num=1092368): Memory allocation failed Recreate journal? yes Creating journal (32768 blocks): Done. *** journal has been re-created - filesystem is now ext3 again *** The drive however still doesn't want to mount: dmesg | tail [ 170.674659] md: export_rdev(sdc) [ 170.675152] md: export_rdev(sdc) [ 195.275288] md: export_rdev(sdc) [ 195.275876] md: export_rdev(sdc) [ 1338.540092] CE: hpet increased min_delta_ns to 30169 nsec [26125.734105] EXT4-fs (sdc): ext4_check_descriptors: Checksum for group 0 failed (43502!=37987) [26125.734115] EXT4-fs (sdc): group descriptors corrupted! [26182.325371] EXT3-fs (sdc): error: couldn't mount because of unsupported optional features (240) [27083.316519] EXT4-fs (sdc): ext4_check_descriptors: Checksum for group 0 failed (43502!=37987) [27083.316530] EXT4-fs (sdc): group descriptors corrupted! Please help me fix this. I never in my wildest nightmares thought a complete mirror would die this badly. Am I missing something? Suggestions on fixing this? Could someone explain why it would resync after the powerout, only to seemingly nuke the drive? Thanks for reading. Any help much appreciated. I've tried everything I can think of, including booting and filesystem checking with SystemRescue and Ubuntu liveboot discs.

    Read the article

  • Slow disk transfer rate

    - by Nooklez
    I have problem with slow disk transfer rate. It's static files server for our website. I was making backup of data and noticed that tar is very slow. So I did hdparm -t and... hdparm -t /dev/sda3 /dev/sda3: Timing buffered disk reads: 6 MB in 4.70 seconds = 1.28 MB/sec It's low traffic hour now on our site, so huge I/O traffic is not a reason (iotop show less than 1 MB/s). It's RAID10 setup (2x2 SATA drives). Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy ------------------------------------------------------------------------------ u0 RAID-10 OK - - 64K 1396.96 W ON VPort Status Unit Size Type Phy Encl-Slot Model ------------------------------------------------------------------------------ p0 OK u0 698.63 GB SATA 0 - WDC WD7500AADS-00M2 p1 OK u0 698.63 GB SATA 1 - WDC WD7500AADS-00M2 p2 OK u0 698.63 GB SATA 2 - WDC WD7500AADS-00M2 p3 OK u0 698.63 GB SATA 3 - WDC WD7500AADS-00M2 We have recently changed almost all components of server (excluding 3ware controller + disks). And I think problems started since then. Can it be configuration problem or hardware? EDIT: I found something like that in dmesg [166843.625843] irq 16: nobody cared (try booting with the "irqpoll" option) [166843.625846] Pid: 0, comm: swapper Not tainted 3.1.5-gentoo #3 [166843.625847] Call Trace: [166843.625848] <IRQ> [<ffffffff810859d5>] __report_bad_irq+0x35/0xc1 [166843.625856] [<ffffffff81085cec>] note_interrupt+0x165/0x1e1 [166843.625859] [<ffffffff8108445f>] handle_irq_event_percpu+0x16f/0x187 [166843.625861] [<ffffffff810844a9>] handle_irq_event+0x32/0x51 [166843.625863] [<ffffffff8108640b>] handle_fasteoi_irq+0x75/0x99 [166843.625866] [<ffffffff810039d7>] handle_irq+0x83/0x8b [166843.625868] [<ffffffff810036ad>] do_IRQ+0x48/0xa0 [166843.625871] [<ffffffff8155082b>] common_interrupt+0x6b/0x6b [166843.625872] <EOI> [<ffffffff812981e8>] ? acpi_safe_halt+0x22/0x35 [166843.625877] [<ffffffff812981e2>] ? acpi_safe_halt+0x1c/0x35 [166843.625879] [<ffffffff81298216>] acpi_idle_do_entry+0x1b/0x2b [166843.625881] [<ffffffff81298276>] acpi_idle_enter_c1+0x50/0x99 [166843.625884] [<ffffffff813b792a>] cpuidle_idle_call+0xed/0x171 [166843.625886] [<ffffffff81001257>] cpu_idle+0x55/0x81 [166843.625888] [<ffffffff81532a69>] rest_init+0x6d/0x6f [166843.625891] [<ffffffff81aa1aca>] start_kernel+0x329/0x334 [166843.625893] [<ffffffff81aa12a6>] x86_64_start_reservations+0xb6/0xba [166843.625894] [<ffffffff81aa139c>] x86_64_start_kernel+0xf2/0xf9 [166843.625896] handlers: [166843.625898] [<ffffffff812dc8de>] twl_interrupt [166843.625900] Disabling IRQ #16 It's related to problem? EDIT #2: Based on feedback in comments, here is more informations. cat /proc/interrupts 16: 390813 0 0 0 IO-APIC-fasteoi 3w-sas Controller model: [ 1.095350] 3ware Storage Controller device driver for Linux v1.26.02.003. [ 1.095467] 3ware 9000 Storage Controller device driver for Linux v2.26.02.014. [ 1.095641] LSI 3ware SAS/SATA-RAID Controller device driver for Linux v3.26.02.000. [ 1.095787] 3w-sas 0000:01:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 [ 1.095881] 3w-sas 0000:01:00.0: setting latency timer to 64 [ 1.910801] 3w-sas: scsi0: Found an LSI 3ware 9750-4i Controller at 0xfe560000, IRQ: 16. [ 2.216537] 3w-sas: scsi0: Firmware FH9X 5.08.00.008, BIOS BE9X 5.07.00.011, Phys: 8. [ 2.216836] scsi 0:0:0:0: Direct-Access LSI 9750-4i DISK 5.08 PQ: 0 ANSI: 5 And motherboard: description: Motherboard product: P8H67-M vendor: ASUSTeK Computer INC.

    Read the article

  • Logitech USB headphones detected and selected in Debian Squeeze but sound still coming from speakers

    - by mattalexx
    I have a pair of Logitech wireless USB headphones that work with Ubuntu Natty but aren't working in Debian Squeeze. When they are selected as the default audio output, the sound comes out of the speakers instead of the headphones. I have rebooted and tried using a different USB port. My computer is a Thinkpad T510. How can I fix this problem? Here is lsusb: Bus 002 Device 005: ID 046d:0a29 Logitech, Inc. Bus 002 Device 002: ID 8087:0020 Intel Corp. Integrated Rate Matching Hub Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 001 Device 005: ID 046d:c52f Logitech, Inc. Wireless Mouse M305 Bus 001 Device 002: ID 8087:0020 Intel Corp. Integrated Rate Matching Hub Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Here is cat /proc/asound/cards 0 [Intel ]: HDA-Intel - HDA Intel HDA Intel at 0xf2420000 irq 17 1 [Headset ]: USB-Audio - Logitech Wireless Headset Logitech Logitech Wireless Headset at usb-0000:00:1d.0-1.1, full speed 2 [NVidia ]: HDA-Intel - HDA NVidia HDA NVidia at 0xcdefc000 irq 17 Here is the gnome-volume-control GUI: Here's lsmod | grep usb: snd_usb_audio 50670 0 snd_usb_lib 11192 1 snd_usb_audio usbhid 28008 0 hid 50909 1 usbhid snd_rawmidi 12513 2 snd_usb_lib,snd_seq_midi snd_hwdep 4054 2 snd_usb_audio,snd_hda_codec snd_pcm 47226 3 snd_usb_audio,snd_hda_intel,snd_hda_codec usbcore 98969 5 snd_usb_audio,snd_usb_lib,usbhid,ehci_hcd snd 34423 11 snd_usb_audio,snd_rawmidi,snd_hda_intel,snd_hda_codec,snd_hwdep,snd_pcm,snd_seq,snd_timer,snd_seq_device nls_base 4541 1 usbcore Here's cat /etc/modprobe.d/alsa-base.conf: # autoloader aliases install sound-slot-0 /sbin/modprobe snd-card-0 install sound-slot-1 /sbin/modprobe snd-card-1 install sound-slot-2 /sbin/modprobe snd-card-2 install sound-slot-3 /sbin/modprobe snd-card-3 install sound-slot-4 /sbin/modprobe snd-card-4 install sound-slot-5 /sbin/modprobe snd-card-5 install sound-slot-6 /sbin/modprobe snd-card-6 install sound-slot-7 /sbin/modprobe snd-card-7 # Cause optional modules to be loaded above generic modules install snd /sbin/modprobe --ignore-install snd && { /sbin/modprobe --quiet snd-ioctl32 ; /sbin/modprobe --quiet snd-seq ; } install snd-rawmidi /sbin/modprobe --ignore-install snd-rawmidi && { /sbin/modprobe --quiet snd-seq-midi ; : ; } install snd-emu10k1 /sbin/modprobe --ignore-install snd-emu10k1 && { /sbin/modprobe --quiet snd-emu10k1-synth ; : ; } # Prevent abnormal drivers from grabbing index 0 options bt87x index=-2 options cx88_alsa index=-2 options snd-atiixp-modem index=-2 options snd-intel8x0m index=-2 options snd-via82xx-modem index=-2 # Keep snd-pcsp from beeing loaded as first soundcard options snd-pcsp index=-2 # Keep snd-usb-audio from beeing loaded as first soundcard options snd-usb-audio index=-2 EDIT In VLC, I reset VLC prefs (Output: Default) and sound still comes out of speakers as expected. Then I change it to "Output: ALSA Audio output" and a Device menu appears. I select the headphones. When I then save the prefs, the audio switch to the headphones! But here's what's weird: I go back to prefs, change it to "Output: Default" and the headphones keep working. Maybe the ALSA option is actually what is being chosen as the "Default" option, but the Device menu (whose selection is still being used) is still set to the headphones. Anyway, now I need to figure out how to make it work as the default for the whole system.

    Read the article

  • Long connection times from PHP to MySQL on EC2

    - by Erik Giberti
    I'm having an intermittent issue connecting to a database slave with InnoDB. Intermittently I get connections taking longer than 2 seconds. These servers are hosted on Amazon's EC2. The app server is PHP 5.2/Apache running on Ubuntu. The DB slave is running Percona's XtraDB 5.1 on Ubuntu 9.10. It's using an EBS Raid array for the data storage. We already use skip name resolve and bind to address 0.0.0.0. This is a stub of the PHP code that's failing $tmp = mysqli_init(); $start_time = microtime(true); $tmp-options(MYSQLI_OPT_CONNECT_TIMEOUT, 2); $tmp-real_connect($DB_SERVERS[$server]['server'], $DB_SERVERS[$server]['username'], $DB_SERVERS[$server]['password'], $DB_SERVERS[$server]['schema'], $DB_SERVERS[$server]['port']); if(mysqli_connect_errno()){ $timer = microtime(true) - $start_time; mail($errors_to,'DB connection error',$timer); } There's more than 300Mb available on the DB server for new connections and the server is nowhere near the max allowed (60 of 1,200). Loading on both servers is < 2 on 4 core m1.xlarge instances. Some highlights from the mysql config max_connections = 1200 thread_stack = 512K thread_cache_size = 1024 thread_concurrency = 16 innodb-file-per-table innodb_additional_mem_pool_size = 16M innodb_buffer_pool_size = 13G Any help on tracing the source of the slowdown is appreciated. [EDIT] I have been updating the sysctl values for the network but they don't seem to be fixing the problem. I made the following adjustments on both the database and application servers. net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_sack = 0 net.ipv4.tcp_timestamps = 0 net.ipv4.tcp_fin_timeout = 20 net.ipv4.tcp_keepalive_time = 180 net.ipv4.tcp_max_syn_backlog = 1280 net.ipv4.tcp_synack_retries = 1 net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 net.ipv4.tcp_rmem = 4096 87380 16777216 net.ipv4.tcp_wmem = 4096 87380 16777216 [EDIT] Per jaimieb's suggestion, I added some tracing and captured the following data using time. This server handles about 51 queries/second at this the time of day. The connection error was raised once (at 13:06:36) during the 3 minute window outlined below. Since there was 1 failure and roughly 9,200 successful connections, I think this isn't going to produce anything meaningful in terms of reporting. Script: date /root/database_server.txt (time mysql -h database_Server -D schema_name -u appuser -p apppassword -e '') /dev/null 2 /root/database_server.txt Results: === Application Server 1 === Mon Feb 22 13:05:01 EST 2010 real 0m0.008s user 0m0.001s sys 0m0.000s Mon Feb 22 13:06:01 EST 2010 real 0m0.007s user 0m0.002s sys 0m0.000s Mon Feb 22 13:07:01 EST 2010 real 0m0.008s user 0m0.000s sys 0m0.001s === Application Server 2 === Mon Feb 22 13:05:01 EST 2010 real 0m0.009s user 0m0.000s sys 0m0.002s Mon Feb 22 13:06:01 EST 2010 real 0m0.009s user 0m0.001s sys 0m0.003s Mon Feb 22 13:07:01 EST 2010 real 0m0.008s user 0m0.000s sys 0m0.001s === Database Server === Mon Feb 22 13:05:01 EST 2010 real 0m0.016s user 0m0.000s sys 0m0.010s Mon Feb 22 13:06:01 EST 2010 real 0m0.006s user 0m0.010s sys 0m0.000s Mon Feb 22 13:07:01 EST 2010 real 0m0.016s user 0m0.000s sys 0m0.010s [EDIT] Per a suggestion received on a LinkedIn question, I tried setting the back_log value higher. We had been running the default value (50) and increased it to 150. We also raised the kernel value /proc/sys/net/core/somaxconn (maximum socket connections) to 256 on both the application and database server from the default 128. We did see some elevation in processor utilization as a result but still received connection timeouts.

    Read the article

  • Bonding: works only for download

    - by Crazy_Bash
    I would like to install bonding with 4 links with mode 4. but only "download/receiving" works with bondig. for transmitting the system chooses one link. ifconfig bond0 Link encap:Ethernet HWaddr 90:E2:BA:0F:76:B4 inet addr:ip Bcast:ip Mask:255.255.255.248 inet6 addr: fe80::92e2:baff:fe0f:76b4/64 Scope:Link UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 RX packets:239187413 errors:0 dropped:10944 overruns:0 frame:0 TX packets:536902370 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:14688536197 (13.6 GiB) TX bytes:799521192901 (744.6 GiB) eth2 Link encap:Ethernet HWaddr 90:E2:BA:0F:76:B4 UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 RX packets:54969488 errors:0 dropped:0 overruns:0 frame:0 TX packets:2537 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:3374778591 (3.1 GiB) TX bytes:314290 (306.9 KiB) eth3 Link encap:Ethernet HWaddr 90:E2:BA:0F:76:B4 UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 RX packets:64935805 errors:0 dropped:1 overruns:0 frame:0 TX packets:2532 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:3993499746 (3.7 GiB) TX bytes:313968 (306.6 KiB) eth4 Link encap:Ethernet HWaddr 90:E2:BA:0F:76:B4 UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 RX packets:57352105 errors:0 dropped:2 overruns:0 frame:0 TX packets:536894778 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:3524236530 (3.2 GiB) TX bytes:799520265627 (744.6 GiB) eth5 Link encap:Ethernet HWaddr 90:E2:BA:0F:76:B4 UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 RX packets:61930025 errors:0 dropped:3 overruns:0 frame:0 TX packets:2540 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:3796021948 (3.5 GiB) TX bytes:314274 (306.9 KiB) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:62 errors:0 dropped:0 overruns:0 frame:0 TX packets:62 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:5320 (5.1 KiB) TX bytes:5320 (5.1 KiB) those are my configs: DEVICE="eth2" BOOTPROTO="none" MASTER=bond0 SLAVE=yes USERCTL=no NM_CONTROLLED="no" ONBOOT="yes" DEVICE="eth3" BOOTPROTO="none" MASTER=bond0 SLAVE=yes USERCTL=no NM_CONTROLLED="no" ONBOOT="yes" DEVICE="eth4" BOOTPROTO="none" MASTER=bond0 SLAVE=yes USERCTL=no NM_CONTROLLED="no" ONBOOT="yes" DEVICE="eth5" BOOTPROTO="none" MASTER=bond0 SLAVE=yes USERCTL=no NM_CONTROLLED="no" ONBOOT="yes" DEVICE=bond0 IPADDR=<ip> BROADCAST=<ip> NETWORK=<ip> GATEWAY=<ip> NETMASK=<ip> USERCTL=no BOOTPROTO=none ONBOOT=yes NM_CONTROLLED=no cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011) Bonding Mode: IEEE 802.3ad Dynamic link aggregation Transmit Hash Policy: layer2 (0) MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 802.3ad info LACP rate: slow Aggregator selection policy (ad_select): stable Active Aggregator Info: Aggregator ID: 1 Number of ports: 4 Actor Key: 17 Partner Key: 11 Partner Mac Address: 00:24:51:12:63:00 Slave Interface: eth2 MII Status: up Speed: 1000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: 90:e2:ba:0f:76:b4 Aggregator ID: 1 Slave queue ID: 0 Slave Interface: eth3 MII Status: up Speed: 1000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: 90:e2:ba:0f:76:b5 Aggregator ID: 1 Slave queue ID: 0 Slave Interface: eth4 MII Status: up Speed: 1000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: 90:e2:ba:0f:76:b6 Aggregator ID: 1 Slave queue ID: 0 Slave Interface: eth5 MII Status: up Speed: 1000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: 90:e2:ba:0f:76:b7 Aggregator ID: 1 Slave queue ID: 0 /etc/modprobe.d/bonding.conf alias bond0 bonding options bond0 mode=4 miimon=100 updelay=200 #downdelay=200 xmit_hash_policy=layer3+4 lacp_rate=1 Linux: Linux 3.0.0+ #1 SMP Fri Oct 26 07:55:47 EEST 2012 x86_64 x86_64 x86_64 GNU/Linux what i've tried: downdelay=200 xmit_hash_policy=layer3+4 lacp_rate=1 mode 6

    Read the article

  • vconfig created virtual interface and trunking - is the the interface untagged or tagged for that VLAN ID?

    - by kce
    I am trying to setup an additional VLAN on our Debian-based router/firewall (which exists as a virtual machine on Hyper-V), our core switch (an HP Procurve 5406) and a remote HP ProCurve 2610 that is connected via a WAN Transparent Lan Service (TLS) link. Let's work backwards from the network edge: The Debian server has an external connection attached to eth0. The internal interface is eth1, which is connected directly from our Hyper-V host to the 5406. The port that eth1 is attached to is setup as Trk12. The 2610 is attached to Trk9 (which trunks a whole slew of VLANs - Trk9 is our TLS head). I can successfully ping the management IP addresses for my VLAN from both switches but I cannot ping, from either switch, the virtual interface for my new VLAN on the Debian-base router and firewall. The existing VLAN works fine. What gives? The port eth1 is attached to is a trunk, the existing VLAN (ID 98) is untagged on the trunk, the new VLAN (ID 198) is tagged. VLAN 198 is tagged on Trk9 on the 5406 and on the 2610. I can ping the other switch's management IP (10.100.198.2 and 10.100.198.3) from the other respective switch. That leg of the VLAN works - however I cannot communicate with eth1.198's 10.100.198.1. I feel like I'm missing something elementary but what it is remains illusive to me. I suspect the issue is with the vconfig created eth1.198. It should pass the tagged VLAN 198 packets correct? But they cannot seem to get any further than the 5406. Communication on the existing VLAN 98 works fine. From the Debian box: eth1: eth1 Link encap:Ethernet HWaddr 00:15:5d:34:5e:03 inet addr:10.100.0.1 Bcast:10.100.255.255 Mask:255.255.0.0 inet6 addr: fe80::215:5dff:fe34:5e03/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:12179786 errors:0 dropped:0 overruns:0 frame:0 TX packets:20210532 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1586498028 (1.4 GiB) TX bytes:26154226278 (24.3 GiB) Interrupt:9 Base address:0xec00 eth1.198: eth1.198 Link encap:Ethernet HWaddr 00:15:5d:34:5e:03 inet addr:10.100.198.1 Bcast:10.100.198.255 Mask:255.255.255.0 inet6 addr: fe80::215:5dff:fe34:5e03/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1496 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:72 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 B) TX bytes:3528 (3.4 KiB) # cat /proc/net/vlan/eth1.198: eth1.198 VID: 198 REORDER_HDR: 0 dev->priv_flags: 1 total frames received 0 total bytes received 0 Broadcast/Multicast Rcvd 0 total frames transmitted 72 total bytes transmitted 3528 total headroom inc 0 total encap on xmit 39 Device: eth1 INGRESS priority mappings: 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0 EGRESS priority mappings: # ip route 10.100.198.0/24 dev eth1.198 proto kernel scope link src 10.100.198.1 206.174.64.0/20 dev eth0 proto kernel scope link src 206.174.66.14 10.100.0.0/16 dev eth1 proto kernel scope link src 10.100.0.1 default via 206.174.64.1 dev eth0 # iptables -L -v Chain INPUT (policy DROP 6875 packets, 637K bytes) pkts bytes target prot opt in out source destination 41 4320 ACCEPT all -- lo any anywhere anywhere 11481 1560K ACCEPT all -- any any anywhere anywhere state RELATED,ESTABLISHED 107 8058 ACCEPT icmp -- any any anywhere anywhere 0 0 ACCEPT tcp -- eth1 any 10.100.0.0/24 anywhere tcp dpt:ssh 701 317K ACCEPT udp -- eth1 any anywhere anywhere udp dpts:bootps:bootpc Chain FORWARD (policy DROP 1 packets, 40 bytes) pkts bytes target prot opt in out source destination 156K 25M ACCEPT all -- eth1 any anywhere anywhere 215K 248M ACCEPT all -- eth0 eth1 anywhere anywhere state RELATED,ESTABLISHED 0 0 ACCEPT all -- eth1.198 any anywhere anywhere 0 0 ACCEPT all -- eth0 eth1.198 anywhere anywhere state RELATED,ESTABLISHED Chain OUTPUT (policy ACCEPT 13048 packets, 1640K bytes) pkts bytes target prot opt in out source destination From the 5406: # show vlan ports trk12 detail Status and Counters - VLAN Information - for ports Trk12 VLAN ID Name | Status Voice Jumbo Mode ------- -------------------- + ---------- ----- ----- -------- 98 WIFI | Port-based No No Untagged 198 VLAN198 | Port-based No No Tagged

    Read the article

  • I need to understand why my server turned off

    - by Dema
    Our organization was robbed and definitely it was inside job. I was set up. I work as a manager and as system administrator in this organization and everything goes against me. The only clue I have is that someone accidentally or intentionally turned of a server that is in the office indicating that some one was inside at the time that no one should be. This is the only evidence I have that can justify me.  I looked the log files and they show that the Power button was pressed. Can you help me to find out that that was not a bug or systems overheat? I will post the log files and if you will ask more I will gladly provide the information. Messages: Dec 24 21:43:14 jamx shutdown[27883]: shutting down for system halt Dec 24 21:43:15 jamx init: Switching to runlevel: 0 Dec 24 21:43:15 jamx smartd[3047]: smartd received signal 15: Terminated Dec 24 21:43:15 jamx smartd[3047]: smartd is exiting (exit status 0) Dec 24 21:43:15 jamx avahi-daemon[3015]: Got SIGTERM, quitting. Dec 24 21:43:15 jamx avahi-daemon[3015]: Leaving mDNS multicast group on interface eth0.IPv6 with address fe80::221:85ff:fe11:8221. Dec 24 21:43:15 jamx avahi-daemon[3015]: Leaving mDNS multicast group on interface eth0.IPv4 with address 82.207.41.239. Dec 24 21:43:15 jamx shutdown[27962]: shutting down for system halt Dec 24 21:43:15 jamx saslauthd[2983]: server_exit     : master exited: 2983 Dec 24 21:43:29 jamx nmbd[2921]: [2010/12/24 21:43:29, 0] nmbd/nmbd.c:terminate(58) Dec 24 21:43:29 jamx nmbd[2921]:   Got SIGTERM: going down... Dec 24 21:43:31 jamx clamd[2526]: Pid file removed. Dec 24 21:43:31 jamx clamd[2526]: --- Stopped at Fri Dec 24 21:43:31 2010 Dec 24 21:43:31 jamx clamd[2526]: Socket file removed. Dec 24 21:43:31 jamx mydns[2645]: jamx.org.ua up 9h44m48s (35088s) 117 questions (0/s) NOERROR=117 SERVFAIL=0 NXDOMAIN=0 NOTIMP=0 REFUSED=0 (100% TCP, 117 queries) Dec 24 21:43:31 jamx mydns[2645]: terminated Dec 24 21:43:34 jamx ntpd[2512]: ntpd exiting on signal 15 Dec 24 21:43:34 jamx hcid[2265]: Got disconnected from the system message bus Dec 24 21:43:35 jamx rpc.statd[2167]: Caught signal 15, un-registering and exiting. Dec 24 21:43:35 jamx portmap[28473]: connect from 127.0.0.1 to unset(status): request from unprivileged port Dec 24 21:43:35 jamx auditd[2021]: The audit daemon is exiting. Dec 24 21:43:35 jamx kernel: audit(1293219815.505:4044): audit_pid=0 old=2021 by auid=4294967295 Dec 24 21:43:35 jamx pcscd: pcscdaemon.c:572:signal_trap() Preparing for suicide Dec 24 21:43:36 jamx pcscd: hotplug_libusb.c:376:HPRescanUsbBus() Hotplug stopped Dec 24 21:43:36 jamx pcscd: readerfactory.c:1379:RFCleanupReaders() entering cleaning function Dec 24 21:43:36 jamx pcscd: pcscdaemon.c:532:at_exit() cleaning /var/run Dec 24 21:43:36 jamx kernel: Kernel logging (proc) stopped. Dec 24 21:43:36 jamx kernel: Kernel log daemon terminating. Dec 24 21:43:37 jamx exiting on signal 15 Acpid: [Fri Dec 24 21:43:14 2010] received event "button/power PWRF 00000080 00000001" [Fri Dec 24 21:43:14 2010] notifying client 2382[68:68] [Fri Dec 24 21:43:14 2010] executing action "/bin/ps awwux | /bin/grep gnome-power-manager | /bin/grep -qv grep || /sbin/shutdown -h now" [Fri Dec 24 21:43:14 2010] BEGIN HANDLER MESSAGES [Fri Dec 24 21:43:15 2010] END HANDLER MESSAGES [Fri Dec 24 21:43:15 2010] action exited with status 0 [Fri Dec 24 21:43:15 2010] completed event "button/power PWRF 00000080 00000001" [Fri Dec 24 21:43:15 2010] received event "button/power PWRF 00000080 00000002" [Fri Dec 24 21:43:15 2010] notifying client 2382[68:68] [Fri Dec 24 21:43:15 2010] executing action "/bin/ps awwux | /bin/grep gnome-power-manager | /bin/grep -qv grep || /sbin/shutdown -h now" [Fri Dec 24 21:43:15 2010] BEGIN HANDLER MESSAGES [Fri Dec 24 21:43:15 2010] END HANDLER MESSAGES [Fri Dec 24 21:43:15 2010] action exited with status 0 [Fri Dec 24 21:43:15 2010] completed event "button/power PWRF 00000080 00000002" [Fri Dec 24 21:43:34 2010] exiting

    Read the article

  • Rsyslog is not working properly, it does not log anything

    - by Victor Henriquez
    I'm running a Debian server and a couple of days ago my rsyslog started to behave very weird, the daemon is running but it doesn't seem to do anything. Many people use the system but I'm the only one with (legal) root access. I'm using the default rsyslogd configuration (if you think is relevant I'll attach it, but it's the one that comes with the package). After I rotated all the log files, they have remained empty: # ls -l /var/log/*.log -rw-r--r-- 1 root root 0 Jun 27 00:25 /var/log/alternatives.log -rw-r----- 1 root adm 0 Jun 26 13:03 /var/log/auth.log -rw-r----- 1 root adm 0 Jun 26 13:03 /var/log/daemon.log -rw-r--r-- 1 root root 0 Jun 27 00:25 /var/log/dpkg.log -rw-r----- 1 root adm 0 Jun 26 13:03 /var/log/kern.log -rw-r----- 1 root adm 0 Jun 26 13:03 /var/log/lpr.log -rw-r----- 1 root adm 0 Jun 26 13:03 /var/log/mail.log -rw-r----- 1 root adm 0 Jun 26 13:03 /var/log/user.log Any try to force a log writing does not have any effect: # logger hey # ls -l /var/log/messages -rw-r----- 1 root adm 0 Jun 26 13:03 /var/log/messages Lsof shows that rsyslogd does not have any log files opened: # lsof -p 1855 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME rsyslogd 1855 root cwd DIR 202,0 4096 2 / rsyslogd 1855 root rtd DIR 202,0 4096 2 / rsyslogd 1855 root txt REG 202,0 342076 21649 /usr/sbin/rsyslogd rsyslogd 1855 root mem REG 202,0 38556 32153 /lib/i386-linux-gnu/i686/cmov/libnss_nis-2.13.so rsyslogd 1855 root mem REG 202,0 79728 32165 /lib/i386-linux-gnu/i686/cmov/libnsl-2.13.so rsyslogd 1855 root mem REG 202,0 26456 32163 /lib/i386-linux-gnu/i686/cmov/libnss_compat-2.13.so rsyslogd 1855 root mem REG 202,0 297500 1061058 /usr/lib/rsyslog/imuxsock.so rsyslogd 1855 root mem REG 202,0 42628 32170 /lib/i386-linux-gnu/i686/cmov/libnss_files-2.13.so rsyslogd 1855 root mem REG 202,0 22784 1061106 /usr/lib/rsyslog/imklog.so rsyslogd 1855 root mem REG 202,0 1401000 32169 /lib/i386-linux-gnu/i686/cmov/libc-2.13.so rsyslogd 1855 root mem REG 202,0 30684 32175 /lib/i386-linux-gnu/i686/cmov/librt-2.13.so rsyslogd 1855 root mem REG 202,0 9844 32157 /lib/i386-linux-gnu/i686/cmov/libdl-2.13.so rsyslogd 1855 root mem REG 202,0 117009 32154 /lib/i386-linux-gnu/i686/cmov/libpthread-2.13.so rsyslogd 1855 root mem REG 202,0 79980 17746 /usr/lib/libz.so.1.2.3.4 rsyslogd 1855 root mem REG 202,0 18836 1061094 /usr/lib/rsyslog/lmnet.so rsyslogd 1855 root mem REG 202,0 117960 31845 /lib/i386-linux-gnu/ld-2.13.so rsyslogd 1855 root 0u unix 0xebe8e800 0t0 640 /dev/log rsyslogd 1855 root 3u FIFO 0,5 0t0 2474 /dev/xconsole rsyslogd 1855 root 4u unix 0xebe8e400 0t0 645 /var/spool/postfix/dev/log rsyslogd 1855 root 5r REG 0,3 0 4026532176 /proc/kmsg I was so frustrated that even reinstall the rsyslog package, but it still refuses to log anything: # apt-get remove --purge rsyslog # apt-get install rsyslog I thought someone had hacked the system, so run rkhunter, chkrootkit, unhide in an attempt to find hide processes / ports and nmap in a remote host to compare with the ports shown by netstat. And I know this doesn't mean anything, but all looks ok. The system also have an iptables firewall that is very restrictive with incoming / outgoing connections. This is driving me crazy, any idea what is going on here? [EDIT - disk space info] # df -h Filesystem Size Used Avail Use% Mounted on rootfs 24G 22G 629M 98% / /dev/root 24G 22G 629M 98% / devtmpfs 10M 112K 9.9M 2% /dev tmpfs 76M 48K 76M 1% /run tmpfs 5.0M 0 5.0M 0% /run/lock tmpfs 151M 40K 151M 1% /tmp tmpfs 151M 0 151M 0% /run/shm

    Read the article

  • "Can't create table" when having to many partitions

    - by Chris
    I am currently having a problem I dont understand. Wherever I look it says mySQL (5.5) / InnoDB doesnt have a table limit. I wanted to test the InnoDB compression and was about to create an empty copy of an existing table and ran into the following problem. this one works: CREATE TABLE `hsc` ( LOTS OF STUFF ) ENGINE=InnoDB CHARSET=utf8 PARTITION BY RANGE (pid) SUBPARTITION BY HASH (cons) SUBPARTITIONS 2 (PARTITION hsc_p0 VALUES LESS THAN (10000) , PARTITION hsc_p1 VALUES LESS THAN (20000) , PARTITION hsc_p2 VALUES LESS THAN (30000) , PARTITION hsc_p3 VALUES LESS THAN (40000) , PARTITION hsc_p4 VALUES LESS THAN (50000) , PARTITION hsc_p40 VALUES LESS THAN (4000000) ); this one doesn't: CREATE TABLE `hsc` ( LOTS OF STUFF ) ENGINE=InnoDB CHARSET=utf8 PARTITION BY RANGE (pid) SUBPARTITION BY HASH (cons) SUBPARTITIONS 2 (PARTITION hsc_p0 VALUES LESS THAN (10000) , PARTITION hsc_p1 VALUES LESS THAN (20000) , PARTITION hsc_p2 VALUES LESS THAN (30000) , PARTITION hsc_p3 VALUES LESS THAN (40000) , PARTITION hsc_p4 VALUES LESS THAN (50000) , PARTITION hsc_p5 VALUES LESS THAN (75000) , PARTITION hsc_p6 VALUES LESS THAN (100000) , PARTITION hsc_p7 VALUES LESS THAN (125000) , PARTITION hsc_p8 VALUES LESS THAN (150000) , PARTITION hsc_p9 VALUES LESS THAN (175000) , PARTITION hsc_p40 VALUES LESS THAN (4000000) ); ERROR 1005 (HY000): Can't create table 'hsc' (errno: 1) Its reproducable by removing the number of partitions and adding them again. it does not have to do anything with the name of the table as i tried various names. there is also enough empty space on the HDD. /dev/simfs 230G 26G 192G 12% /var/lib/mysql.mnt There should be no limit on the partitions http://dev.mysql.com/doc/refman/5.5/en/partitioning-limitations.html Maximum number of partitions. The maximum possible number of partitions for a given table (that does not use the NDB storage engine) is 1024. This number includes subpartitions. i have increased both open_files show variables where variable_name LIKE '%open_files%'; +-------------------+-------+ | Variable_name | Value | +-------------------+-------+ | innodb_open_files | 512 | | open_files_limit | 1536 | +-------------------+-------+ No change. Any clues where should I start looking? UPDATE: the whole thing is running in an openvz environment. i saw in users_beancounters that the numflock was a problem, so i increased it. but the problem still persists. maybe this helps: ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 515011 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 515011 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited cat /proc/user_beancounters Version: 2.5 uid resource held maxheld barrier limit failcnt 200: kmemsize 9309653 13357056 14372700 14790164 0 lockedpages 0 1008 2048 2048 0 privvmpages 675424 686528 1048576 1572864 0 shmpages 33 673 21504 21504 0 dummy 0 0 9223372036854775807 9223372036854775807 0 numproc 49 90 240 240 0 physpages 243761 246945 0 9223372036854775807 0 vmguarpages 0 0 1048576 1048576 0 oomguarpages 81672 83305 1048576 1048576 0 numtcpsock 6 8 360 360 0 numflock 175 188 512 512 8 numpty 1 9 16 16 0 numsiginfo 0 48 256 256 0 tcpsndbuf 104640 263912 1720320 2703360 0 tcprcvbuf 98304 131072 1720320 2703360 0 othersockbuf 32368 89304 1126080 2097152 0 dgramrcvbuf 0 2312 262144 262144 0 numothersock 19 28 360 360 0 dcachesize 2285052 3624426 3409920 3624960 0 numfile 616 870 9312 9312 0 dummy 0 0 9223372036854775807 9223372036854775807 0 dummy 0 0 9223372036854775807 9223372036854775807 0 dummy 0 0 9223372036854775807 9223372036854775807 0 numiptent 24 24 128 128 0

    Read the article

  • iCloud stuff stops working while connected to OpenVPN

    - by Taco Bob
    I have a fairly simple OpenVPN setup on an OpenVZ VPS with Ubuntu 11.10. Client is the Viscosity client on Mac OS X 10.8.2, and after some testing, we can rule out the client as being part of the problem. Everything has been working fine except for Apple's iCloud stuff. Web surfing, email, FTP, NNTP, and Skype are all working as expected. It's ONLY the iCloud services that cease to function. If I connect to the VPN, iCloud stuff stops working. I no longer get anything in Messages, Calendar items don't get updated, and Notifications stop working. If I disconnect, the iCloud stuff all starts working. Connect again, iCloud stops working. Here's the server.conf: status openvpn-status.log log /var/log/openvpn.log verb 4 port 1194 proto udp dev tun ca /etc/openvpn/ca.crt cert /etc/openvpn/server.crt key /etc/openvpn/server.key dh /etc/openvpn/dh1024.pem server 10.9.8.0 255.255.255.0 ifconfig-pool-persist ipp.txt push "redirect-gateway def1" push “dhcp-option DNS 10.9.8.1? keepalive 10 120 duplicate-cn cipher BF-CBC comp-lzo user nobody group nogroup persist-key persist-tun tun-mtu 1500 mssfix 1400 I'm using iptables in a script, and it's also fairly simplistic. iptables -F iptables -t nat -F iptables -t mangle -F iptables -A FORWARD -i tun0 -o venet0 -j ACCEPT iptables -A FORWARD -i venet0 -o tun0 -j ACCEPT iptables -A INPUT -p tcp --dport 22 -j ACCEPT iptables -A INPUT -p tcp --dport 1194 -j ACCEPT iptables -A INPUT -p udp --dport 1194 -j ACCEPT iptables -t nat -A POSTROUTING -s 10.9.8.0/24 -j SNAT --to-source <server's public ip> echo 1 > /proc/sys/net/ipv4/ip_forward I tried forwarding ports as well, with no success. iptables -A FORWARD -p tcp -d 10.9.8.0/24 --dport 5222:5230 -j ACCEPT iptables -t nat -A PREROUTING -p tcp --dport 5222:5230 -j DNAT --to-destination 10.9.8.6 I am also sometimes behind a double-NAT situation that I have no control over. Client -> work VPN -> my OpenVPN box -> Internet. Client -> Airport Express -> ISP (which is doing NAT) -> my OpenVPN box -> Internet. Those two situations are just the fact of life where I am, and I cannot change them. I do have full control over my client and the OpenVPN server. I am completely out of ideas. I have posted a similar query at the OpenVPN forums, but it hasn't posted yet and seems to be in their moderation queue still. Tried on freenode irc channels, but nobody is awake, so here I am. I have Googled extensively for this, and can find nothing that is related. Help me get iCloud stuff working again! (I tried serverfault, it was closed as off-topic. I'm trying here and the Unix site as well. Here because it's a more general audience that might know more about OpenVPN based on the number of questions I see asked about it) EDIT: -I have also tried upgrading to Version: 2.3-beta1-debian0 - issue persists. -Removed all iptables rules except for the ones that flush -left this rule:iptables -t nat -A POSTROUTING -s 10.9.8.0/24 -j SNAT --to-source (server ip) -added iptables -A FORWARD -m state --state RELATED,ESTABLISHED -j ACCEPT still, nothing works. I can see traffic in tcpdump on the server if i watch the tunnel: 20:03:48.702835 IP nk11p01st-courier105-bz.push.apple.com.5223 10.9.8.6.60772: Flags [F.], seq 2635, ack 1218, win 76, options [nop,nop,TS val 914984811 ecr 745921298], length 0 20:03:48.911244 IP 10.9.8.6.60772 nk11p01st-courier105-bz.push.apple.com.5223: Flags [R], seq 3621143451, win 0, length 0 But still, no push messages/notifications are ever delivered. :/ EDIT: * Further testing indicates that it might actually be the client after all.

    Read the article

  • Can't re-mount existing RAID10 on Ubuntu

    - by Zoran
    I saw similar questions, but didn't find what solution to my problem. After power-cut, one of RAID10 (4 disks were) appears to be malfunctioning. I make tha array active one, but can not mount it. Always the same error: mount: you must specify the filesystem type So, here is what I have when type mdadm --detail /dev/md0 /dev/md0: Version : 00.90.03 Creation Time : Tue Sep 1 11:00:40 2009 Raid Level : raid10 Array Size : 1465148928 (1397.27 GiB 1500.31 GB) Used Dev Size : 732574464 (698.64 GiB 750.16 GB) Raid Devices : 4 Total Devices : 3 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Mon Jun 11 09:54:27 2012 State : clean, degraded Active Devices : 3 Working Devices : 3 Failed Devices : 0 Spare Devices : 0 Layout : near=2, far=1 Chunk Size : 64K UUID : 1a02e789:c34377a1:2e29483d:f114274d Events : 0.166 Number Major Minor RaidDevice State 0 8 16 0 active sync /dev/sdb 1 0 0 1 removed 2 8 48 2 active sync /dev/sdd 3 8 64 3 active sync /dev/sde At the /etc/mdadm/mdadm.conf I have by default, scan all partitions (/proc/partitions) for MD superblocks. alternatively, specify devices to scan, using wildcards if desired. DEVICE partitions auto-create devices with Debian standard permissions CREATE owner=root group=disk mode=0660 auto=yes automatically tag new arrays as belonging to the local system HOMEHOST <system> instruct the monitoring daemon where to send mail alerts MAILADDR root definitions of existing MD arrays ARRAY /dev/md0 level=raid10 num-devices=4 UUID=1a02e789:c34377a1:2e29483d:f114274d ARRAY /dev/md1 level=raid1 num-devices=2 UUID=9b592be7:c6a2052f:2e29483d:f114274d This file was auto-generated... So, my question is, how can I mount md0 array (md1 has been mounted without problem) in order to preserve existing data? One more thing, fdisk -l command gives the following result: Disk /dev/sdb: 750.1 GB, 750156374016 bytes 255 heads, 63 sectors/track, 91201 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x660a6799 Device Boot Start End Blocks Id System /dev/sdb1 * 1 88217 708603021 83 Linux /dev/sdb2 88218 91201 23968980 5 Extended /dev/sdb5 88218 91201 23968948+ 82 Linux swap / Solaris Disk /dev/sdc: 750.1 GB, 750156374016 bytes 255 heads, 63 sectors/track, 91201 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x0008f8ae Device Boot Start End Blocks Id System /dev/sdc1 1 88217 708603021 83 Linux /dev/sdc2 88218 91201 23968980 5 Extended /dev/sdc5 88218 91201 23968948+ 82 Linux swap / Solaris Disk /dev/sdd: 750.1 GB, 750156374016 bytes 255 heads, 63 sectors/track, 91201 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x4be1abdb Device Boot Start End Blocks Id System Disk /dev/sde: 750.1 GB, 750156374016 bytes 255 heads, 63 sectors/track, 91201 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0xa4d5632e Device Boot Start End Blocks Id System Disk /dev/sdf: 750.1 GB, 750156374016 bytes 255 heads, 63 sectors/track, 91201 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0xdacb141c Device Boot Start End Blocks Id System Disk /dev/sdg: 750.1 GB, 750156374016 bytes 255 heads, 63 sectors/track, 91201 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0xdacb141c Device Boot Start End Blocks Id System Disk /dev/md1: 750.1 GB, 750156251136 bytes 2 heads, 4 sectors/track, 183143616 cylinders Units = cylinders of 8 * 512 = 4096 bytes Disk identifier: 0xdacb141c Device Boot Start End Blocks Id System Warning: ignoring extra data in partition table 5 Warning: ignoring extra data in partition table 5 Warning: ignoring extra data in partition table 5 Warning: invalid flag 0x7b6e of partition table 5 will be corrected by w(rite) Disk /dev/md0: 1500.3 GB, 1500312502272 bytes 255 heads, 63 sectors/track, 182402 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x660a6799 Device Boot Start End Blocks Id System /dev/md0p1 * 1 88217 708603021 83 Linux /dev/md0p2 88218 91201 23968980 5 Extended /dev/md0p5 ? 121767 155317 269488144 20 Unknown And one more thing. When using mdadm --examine command, here ise result: mdadm -v --examine --scan /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sd ARRAY /dev/md1 level=raid1 num-devices=2 UUID=9b592be7:c6a2052f:2e29483d:f114274d devices=/dev/sdf ARRAY /dev/md0 level=raid10 num-devices=4 UUID=1a02e789:c34377a1:2e29483d:f114274d devices=/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde md0 has 3 devices which are active. Can someone instruct me how to solve this issue? If it is possible, I would like not to removing faulty HDD. Please advise

    Read the article

  • e2fsck / resize2fs problems

    - by BlakBat
    I've got 6 drives (each 1.5T, all same model and firmware revision) that are part of a RAID5 array. The RAID5 makes a LVM volume group and a logical group. The latter contains only one ext3 partition. I've recently ran: e2fsck -f /dev/vg03/lv01 && resize2fs -M /dev/vg03/lv01 which exited without an error. Now when I try to mount /dev/vg03/lv01 I get: EXT3-fs error (device dm-0): ext3_check_descriptors: Block bitmap for group 30533 not in group (block 1000532368)! EXT3-fs: group descriptors corrupted! How do I get out of this predicament? This is all the info I can currently give you: fdisk -l /dev/sd[cdefgh] shows (correctly) that they are "Linux raid autodetect" but fdisk now shows: fdisk -l /dev/md0 Disk /dev/md0: 7501.5 GB, 7501495664640 bytes ... Disk identifier: 0x00000000 Disk /dev/md0 doesn't contain a valid partition table (instead of a LVM type partition) fdisk -l /dev/vg03/lv01 Disk /dev/vg03/lv01: 7501.5 GB, 7501491732480 bytes ... Disk identifier: 0x00000000 Disk /dev/vg03/lv01 doesn't contain a valid partition table (instead of a ext3 type partition) I've tried: e2fsck -fy /dev/vg03/lv01 e2fsck 1.41.12 (17-May-2010) e2fsck: Group descriptors look bad... trying backup blocks... Block bitmap for group 30533 is not in group. (block 1000532368) Relocate? yes Inode bitmap for group 30533 is not in group. (block 1000532369) Relocate? yes Pass 1: Checking inodes, blocks, and sizes Relocating group 30533's block bitmap to 1000524246... Error allocating 1 contiguous block(s) in block group 30533 for inode bitmap: Could not allocate block in ext2 filesystem e2fsck: aborted Extra information I can give you: cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md0 : active (auto-read-only) raid5 sdg1[0] sdh1[5] sdf1[4] sde1[3] sdc1[2] sdd1[1] 7325679360 blocks level 5, 128k chunk, algorithm 2 [6/6] [UUUUUU] bitmap: 1/175 pages [4KB], 4096KB chunk unused devices: Lastly, all smartctl tests (short and extendend) showed no errors on any of the disks. Should I try to resize2fs to grow /dev/vg03/lv01 and redo a e2fsck ? Should I cfdisk /dev/md0 and /dev/vg03/lv01 back to their real types? Thanks in advance for all and any help. 2011-09-20 UPDATE I issued the following commands and was able to remount the partition, but by viewing the size (df) of before and after, it seems that 1Tb of data have gone missing. By checking the MD5SUMS (from an old backup) of some files with the "same" files from the remounted partition, some errors have been detected. Commands issued to remount the partition were: dumpe2fs /dev/vg03/lv01 Block count: 1000491435<br /> Block size: 4096<br /> tune2fs -O ^has_journal /dev/vg03/lv01 resize2fs -p /dev/vg03/lv01 dumpe2fs /dev/vg03/lv01 Block count: 1831418880<br /> Block size: 4096<br /> mount -o ro,noatime /dev/vg03/lv01 /mnt/raid OK... but files have been damaged / gone missing.

    Read the article

  • volume group disappeared after xfs_check run

    - by John P
    EDIT** I have a volume group consisting of 5 RAID1 devices grouped together into a lvm and formatted with xfs. The 5th RAID device lost its RAID config (cat /proc/mdstat does not show anything). The two drives are still present (sdj and sdk), but they have no partitions. The LVM appeared to be happily using sdj up until recently. (doing a pvscan showed the first 4 RAID1 devices + /dev/sdj) I removed the LVM from the fstab, rebooted, then ran xfs_check on the LV. It ran for about half an hour, then stopped with an error. I tried rebooting again, and this time when it came up, the logical volume was no longer there. It is now looking for /dev/md5, which is gone (though it had been using /dev/sdj earlier). /dev/sdj was having read errors, but after replacing the SATA cable, those went away, so the drive appears to be fine for now. Can I modify the /etc/lvm/backup/dedvol, change the device to /dev/sdj and do a vgcfgrestore? I could try doing a pvcreate --uuid KZron2-pPTr-ZYeQ-PKXX-4Woq-6aNc-AG4rRJ /dev/sdj to make it recognize it, but I'm afraid that would erase the data on the drive UPDATE: just changing the pv to point to /dev/sdj did not work vgcfgrestore --file /etc/lvm/backup/dedvol dedvol Couldn't find device with uuid 'KZron2-pPTr-ZYeQ-PKXX-4Woq-6aNc-AG4rRJ'. Cannot restore Volume Group dedvol with 1 PVs marked as missing. Restore failed. pvscan /dev/sdj: read failed after 0 of 4096 at 0: Input/output error Couldn't find device with uuid 'KZron2-pPTr-ZYeQ-PKXX-4Woq-6aNc-AG4rRJ'. Couldn't find device with uuid 'KZron2-pPTr-ZYeQ-PKXX-4Woq-6aNc-AG4rRJ'. Couldn't find device with uuid 'KZron2-pPTr-ZYeQ-PKXX-4Woq-6aNc-AG4rRJ'. Couldn't find device with uuid 'KZron2-pPTr-ZYeQ-PKXX-4Woq-6aNc-AG4rRJ'. PV /dev/sdd2 VG VolGroup00 lvm2 [74.41 GB / 0 free] PV /dev/md2 VG dedvol lvm2 [931.51 GB / 0 free] PV /dev/md3 VG dedvol lvm2 [931.51 GB / 0 free] PV /dev/md0 VG dedvol lvm2 [931.51 GB / 0 free] PV /dev/md4 VG dedvol lvm2 [931.51 GB / 0 free] PV unknown device VG dedvol lvm2 [1.82 TB / 63.05 GB free] Total: 6 [5.53 TB] / in use: 6 [5.53 TB] / in no VG: 0 [0 ] vgscan Reading all physical volumes. This may take a while... /dev/sdj: read failed after 0 of 4096 at 0: Input/output error /dev/sdj: read failed after 0 of 4096 at 2000398843904: Input/output error Found volume group "VolGroup00" using metadata type lvm2 Found volume group "dedvol" using metadata type lvm2 vgdisplay dedvol --- Volume group --- VG Name dedvol System ID Format lvm2 Metadata Areas 5 Metadata Sequence No 10 VG Access read/write VG Status resizable MAX LV 0 Cur LV 1 Open LV 0 Max PV 0 Cur PV 5 Act PV 5 VG Size 5.46 TB PE Size 4.00 MB Total PE 1430796 Alloc PE / Size 1414656 / 5.40 TB Free PE / Size 16140 / 63.05 GB VG UUID o1U6Ll-5WH8-Pv7Z-Rtc4-1qYp-oiWA-cPD246 dedvol { id = "o1U6Ll-5WH8-Pv7Z-Rtc4-1qYp-oiWA-cPD246" seqno = 10 status = ["RESIZEABLE", "READ", "WRITE"] flags = [] extent_size = 8192 # 4 Megabytes max_lv = 0 max_pv = 0 physical_volumes { pv0 { id = "Msiee7-Zovu-VSJ3-Y2hR-uBVd-6PaT-Ho9v95" device = "/dev/md2" # Hint only status = ["ALLOCATABLE"] flags = [] dev_size = 1953519872 # 931.511 Gigabytes pe_start = 384 pe_count = 238466 # 931.508 Gigabytes } pv1 { id = "ZittCN-0x6L-cOsW-v1v4-atVN-fEWF-e3lqUe" device = "/dev/md3" # Hint only status = ["ALLOCATABLE"] flags = [] dev_size = 1953519872 # 931.511 Gigabytes pe_start = 384 pe_count = 238466 # 931.508 Gigabytes } pv2 { id = "NRNo0w-kgGr-dUxA-mWnl-bU5v-Wld0-XeKVLD" device = "/dev/md0" # Hint only status = ["ALLOCATABLE"] flags = [] dev_size = 1953519872 # 931.511 Gigabytes pe_start = 384 pe_count = 238466 # 931.508 Gigabytes } pv3 { id = "2EfLFr-JcRe-MusW-mfAs-WCct-u4iV-W0pmG3" device = "/dev/md4" # Hint only status = ["ALLOCATABLE"] flags = [] dev_size = 1953519872 # 931.511 Gigabytes pe_start = 384 pe_count = 238466 # 931.508 Gigabytes } pv4 { id = "KZron2-pPTr-ZYeQ-PKXX-4Woq-6aNc-AG4rRJ" device = "/dev/md5" # Hint only status = ["ALLOCATABLE"] flags = [] dev_size = 3907028992 # 1.81935 Terabytes pe_start = 384 pe_count = 476932 # 1.81935 Terabytes } }

    Read the article

  • linux raid 1: right after replacing and syncing one drive, the other disk fails - understanding what is going on with mdstat/mdadm

    - by devicerandom
    We have an old RAID 1 Linux server (Ubuntu Lucid 10.04), with four partitions. A few days ago /dev/sdb failed, and today we noticed /dev/sda had pre-failure ominous SMART signs (~4000 reallocated sector count). We replaced /dev/sdb this morning and rebuilt the RAID on the new drive, following this guide: http://www.howtoforge.com/replacing_hard_disks_in_a_raid1_array Everything went smooth until the very end. When it looked like it was finishing to synchronize the last partition, the other old one failed. At this point I am very unsure of the state of the system. Everything seems working and the files seem to be all accessible, just as if it synchronized everything, but I'm new to RAID and I'm worried about what is going on. The /proc/mdstat output is: Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] md3 : active raid1 sdb4[2](S) sda4[0] 478713792 blocks [2/1] [U_] md2 : active raid1 sdb3[1] sda3[2](F) 244140992 blocks [2/1] [_U] md1 : active raid1 sdb2[1] sda2[2](F) 244140992 blocks [2/1] [_U] md0 : active raid1 sdb1[1] sda1[2](F) 9764800 blocks [2/1] [_U] unused devices: <none> The order of [_U] vs [U_]. Why aren't they consistent along all the array? Is the first U /dev/sda or /dev/sdb? (I tried looking on the web for this trivial information but I found no explicit indication) If I read correctly for md0, [_U] should be /dev/sda1 (down) and /dev/sdb1 (up). But if /dev/sda has failed, how can it be the opposite for md3 ? I understand /dev/sdb4 is now spare because probably it failed to synchronize it 100%, but why does it show /dev/sda4 as up? Shouldn't it be [__]? Or [_U] anyway? The /dev/sda drive now cannot even be accessed by SMART anymore apparently, so I wouldn't expect it to be up. What is wrong with my interpretation of the output? I attach also the outputs of mdadm --detail for the four partitions: /dev/md0: Version : 00.90 Creation Time : Fri Jan 21 18:43:07 2011 Raid Level : raid1 Array Size : 9764800 (9.31 GiB 10.00 GB) Used Dev Size : 9764800 (9.31 GiB 10.00 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Tue Nov 5 17:27:33 2013 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 1 Spare Devices : 0 UUID : a3b4dbbd:859bf7f2:bde36644:fcef85e2 Events : 0.7704 Number Major Minor RaidDevice State 0 0 0 0 removed 1 8 17 1 active sync /dev/sdb1 2 8 1 - faulty spare /dev/sda1 /dev/md1: Version : 00.90 Creation Time : Fri Jan 21 18:43:15 2011 Raid Level : raid1 Array Size : 244140992 (232.83 GiB 250.00 GB) Used Dev Size : 244140992 (232.83 GiB 250.00 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Tue Nov 5 17:39:06 2013 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 1 Spare Devices : 0 UUID : 8bcd5765:90dc93d5:cc70849c:224ced45 Events : 0.1508280 Number Major Minor RaidDevice State 0 0 0 0 removed 1 8 18 1 active sync /dev/sdb2 2 8 2 - faulty spare /dev/sda2 /dev/md2: Version : 00.90 Creation Time : Fri Jan 21 18:43:19 2011 Raid Level : raid1 Array Size : 244140992 (232.83 GiB 250.00 GB) Used Dev Size : 244140992 (232.83 GiB 250.00 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 2 Persistence : Superblock is persistent Update Time : Tue Nov 5 17:46:44 2013 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 1 Spare Devices : 0 UUID : 2885668b:881cafed:b8275ae8:16bc7171 Events : 0.2289636 Number Major Minor RaidDevice State 0 0 0 0 removed 1 8 19 1 active sync /dev/sdb3 2 8 3 - faulty spare /dev/sda3 /dev/md3: Version : 00.90 Creation Time : Fri Jan 21 18:43:22 2011 Raid Level : raid1 Array Size : 478713792 (456.54 GiB 490.20 GB) Used Dev Size : 478713792 (456.54 GiB 490.20 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 3 Persistence : Superblock is persistent Update Time : Tue Nov 5 17:19:20 2013 State : clean, degraded Active Devices : 1 Working Devices : 2 Failed Devices : 0 Spare Devices : 1 Number Major Minor RaidDevice State 0 8 4 0 active sync /dev/sda4 1 0 0 1 removed 2 8 20 - spare /dev/sdb4 The active sync on /dev/sda4 baffles me. I am worried because if tomorrow morning I have to replace /dev/sda, I want to be sure what should I sync with what and what is going on. I am also quite baffled by the fact /dev/sda decided to fail exactly when the raid finished resyncing. I'd like to understand what is really happening. Thanks a lot for your patience and help. Massimo

    Read the article

  • Apache Getting Bogged Down By Certain Script (Wp-Cron.php) - How To Kill Process Automatically

    - by user50037
    I have a server that is running a number of wordpress blogs, and a number of them have several hundred/thousand posts. Every couple of days, the server slows to a crawl due to a file being run on Wordpress called WP-cron.php. My entire apache process log turns into this : http:// imgur.com/A7K9k.png Times that by quite a bit. And server no go. Each process takes up about 1.1% of ram. And when we have 50 of them on the go. It gets insane. Not all of them are coming from the same blog, they are pretty widespread. In the Apache process page of WHM, they are usually ALL set to the status of "C", which means closing. But they can sit there until they crash the server, and they still hold the memory. Just google "wp-cron.php load" and you will find plenty of people with similar issues. In anycase, we have think it is down to users adding a tonne of dead "pinglists" to their wordpress installation. Which in turn wordpress loops through them endlessly. Problem number 1. Does anyone have any other suggestions about what would cause the Wordpress file wp-cron.php to loop endlessly. I still think it is down to pings, because all of the people we have contacted about their account load going sky high, have had massive ping lists. Problem number 2. Even if it is down to excessive pinglists in wordpress. We cannot be babying every single account on the server waiting for it to start spawning the wp-cron processes. It often happens overnight, and I start getting SMS alerts at 2am about the load. I have CSF installed, which apparently would have ended the processes if they ran over XXX time. But I have been told that it won't catch the processes because they end up in this state of "closing" (They show up as "C" on the Apache page of WHM). Apparently CSF will only kill processes that are "running" which C does not count. I have seen various other scripts such as : http://dltj.org/article/die-apache-die/ . I took a look at the stat of /proc. But I was boggled at which delimited part was the time running. And if there was any way I could connect it back to an actual Apache process, so that I could see what file was running (So only close connections connected to wp-cron.php, with a state of "C"). Overall I know Problem 2 glosses over the real reason. But I do put the whole thing to excessive pinglists in Wordpress. But I just cannot sit there and babysit every single installation 24/7. So I need a way to save the server when I am not available. Any help would be much appreciated.

    Read the article

< Previous Page | 58 59 60 61 62 63 64 65  | Next Page >