Search Results

Search found 74 results on 3 pages for 'supermicro'.

Page 3/3 | < Previous Page | 1 2 3 

  • server 2008 sp2 hung not responding with kvm reset as well

    - by dasko
    using server 2008 sp2 64bit os standard edition with all updates from ms site. server was hung this morning, resetting kvm did nothing, plugging in another usb mouse on the back did not let the mouse light up red on it's optic end. other machines on the kvm worked fine including the mouse. server is rack mounted 4u supermicro systems or superserver. had to hard power off and restart. any thoughts? i burnt this system in well for a couple of weeks before deploying so it is kind of odd that this happened. any help is greatly appreciated, or if anyone can suggest software to install that can maybe send out the email when something like that goes down. i looked for the minidump but nothing. nothing in the event viewers either. gd

    Read the article

  • Raid on ICH9R chip set

    - by user500982
    Hi Im looking at buyign this MB: http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7SPE-HF-D525.cfm Im wondering though if the chipset will support the raid configuration I need. Im looking to configure the following arrays: raid array 1: 2 2TB Disks in Raid 0 raid array 2: 2 2TB Disks in Raid 0 raid array 3 (not actualy an array): 1 300GB Disk not in raid, to be used for OS and boot. So in total there would be 5 drives and the board supports 6. so im good when it comes to connections. However I have herd some chip sets only support one raid array (volume). so either all drives are individual, or are in the array. I must have 2 sperate raid arrays independent of each other, and a 5th drive not in any array. Anybody know if this setup will work? Thanks, -Stewart

    Read the article

  • Certain drives in RAID 5 set intermittently are not recognized

    - by hydroparadise
    I have a curious problem in that 1 (sometimes 2) drives do not get recognized in a RAID 5 set. The server is getting rather old at 5 to 6 years, but still seems to function well once the machine sees all drives. So that leaves me with three areas to consider: the motherboard, the SATA RAID card, or the individual hard drives themselves. I am leaning toward the RAID card, but have not had much dealings with RAID cards. What would cause individual drives not to be recognized in the set? If it was the card, I would think that it would be all or nothing. If it were a single drive, is it possible that it would only work sometimes? The only other thing to consider is that that they are different drives (Seagate and Western Digital) but all around 80 GB. SATA RAID controller is 3ware Escalade 8506-4LP Motherboard is a SuperMicro P4SPA+ Am open and available for more details if needed...

    Read the article

  • 2 "P4" power connectors *AND* an 8 pin EPS connector -- all necessary?

    - by chris
    I recently bought an Arima SW350 motherboard. It supports 16 ddr simms as well as 2 opteron 2xx CPUs, so I imagine it may have pretty heavy power requirements... It has a 24 pin ATX power connector, 2 4 pin "P4" power connectors, and an 8 pin EPS power connector. My supermicro power supply has a 24 pin ATX connector and the EPS connector, and one P4 connector. Can I safely run this thing with only one P4 connector plugged in? Can I safely get a "hard disk to P4" adapter and use that to power the other P4 connector on the motherboard? Do I just need a new power supply? The board's documentation is pretty thin on the topic.

    Read the article

  • NFS of NAS server blocks in cluster environment

    - by Zardoz
    In our department we have an Iomega NAS (px4-300d) connected to a Supermicro cluster with 5 nodes (12 cores per node). Each node mounts a share on that NAS by using NFS. Unfortunately after some time (several minutes) of permanent read/write operations (from all nodes) the NAS starts to block and a bit later freezes completely. We tried several options of the mount command, but nothing helped (async, intr, wsize, rsize). The NAS itself doesn't allow many options (better to say none). Do you have any recommendation how to integrate a NAS using NFS in a cluster environment?

    Read the article

  • Server spec for a small business [duplicate]

    - by I'll-Be-Back
    This question already has an answer here: Can you help me with my capacity planning? 2 answers I will need to buy a decent server for Windows Server 2012 and Linux for Web server (Internal use only - Intranet). I will install ESXi with 2 or 3 VM's. There will be about 80-100 Agents at work, they will login (domain controller) on client PC in the morning (between 9:40am to 10:05am). They can only use IE browser and everything else will be locked. They will not have any storage space, no email, etc. Is this spec decent enough? 2u Supermicro 825 chassis, X9SCL-F x1 Intel E3-1290v2 16Gb DDR3 x2 Intel 520 Series 240Gb x2 2Tb Seagate Barracuda, LSi 4 port SAS raid controller

    Read the article

  • Why might one hard disk perform slower than another?

    - by Styne666
    I have just bought two WD 3TB Reds (WD30EFRX) for a FreeNAS box and whilst doing burn-in testing it seems like one is consistently taking about 10% longer than the other. So far I've done: a dd read test of the whole device, a long SMART test and it's currently halfway through a badblocks -wvs. The second device is lagging behind the first on all of them. I'm running these commands on Debian stable in two Konsole tabs. Is there a reason this could be considered normal behaviour or is it worth running the tests independantly? They're both plugged in to the LSI 2308 (IT mode) on a Supermicro X10SL7-F.

    Read the article

  • Cant access Dell BMC IPMI Over IP

    - by Bobb
    I have Dell R210 with iDRAC BMC (new name for old BMC). Which is on-board feature with shared NIC (I believe). Server is on colocation and I didnt set it up before sent there... So I asked for the remote hands to setup IPMI Over IP. They enabled it, set the IP and everything. The IP is different than main box IP. Also the box is cabled to NIC1 and the BMC supposed to share it (am I right?) I can see new IP in the Open Server Administrator (installed on the box). I tried Supermicro IPMI tool and I tried Dell ipmish.exe command like this ipmish -ip xxx -u root -p calvin sysinfo gives BMC is not detected What could be wrong? is there a diagnostics tool I can try? It must be something obvious. I just never used things like that before.... P.S. I read something about encryptions key in the Dell docs. But I understand that is for encrypted IPMI 2.0 and ipmish can use IPMI 1.5 without encryption.

    Read the article

  • Is it necessary to burn-in RAM for server-class systems?

    - by ewwhite
    When using server-class systems with ECC RAM, is it necessary or even useful to burn-in the memory DIMMs prior to deployment? I've encountered an environment where all server RAM is placed through a lengthy multi-day burn-in/stress-tesing process. This has delayed system deployments on occasion and adds an extra step to the hardware lead-time. The server hardware is primarily Supermicro, so the RAM is sourced from a variety of vendors; not directly from the manufacturer like a Dell Poweredge or HP ProLiant. Is this process useful? In my past experience, I simply used vendor RAM out of the box. Isn't that what the POST memory tests are for? I've encountered and responded to ECC errors long before a DIMM actually failed. The ECC thresholds were usually the trigger for warranty placement. Do you burn your RAM in? If so, what method do you use to perform the tests? Has the burn-in process resulted in any additional platform stability? Has it identified any pre-deployment problems?

    Read the article

  • Dual Xeon Server voltages are low

    - by Mindflux
    I've got a whitebox server running CentOS 5.7. It's a Dual Xeon 5620, 24GB of RAM. The mainboard is a SuperMicro X8DT6-F and the chassis is a SC825TQ-R720LPB. Dual 720W Power supplies. We had a big power outage a couple weeks back that took down everything, I don't have any pre-power outage figures for this server, and the only reason I noticed these is because when I was bringing up the servers I was checking them out with more scrutiny than usual. http://i.imgur.com/rSjiw.png (Image of voltage readings) As you can see, CPU1 DIMM is low, +3.3V is high, 3.3VSB is high, +5v is high, +12v is REAL LOW (out of normal 5% (plus/minus))... and VBAT is off the charts. With my whitebox VAR we've tried the following: Swap out PSU with another server I have with the same PSUs. Try different power cord Update BMC/IPMI firmware in case readings were wrong (They aren't) Update BIOS Try different PDU Try a different outlet and/or circuit Replaced Voltage Regulator Unit At this point, the only thing we haven't done, seemingly is replace the mainboard.. which is what the next step will be unless something else shines some light on the situation. I should mention the system is rock solid otherwise which is a surprise given the 12v voltage is that far off.

    Read the article

  • can't access SATA card config screen on boot, nor access the disks

    - by Ronald
    We've just upgraded our file server using an ASUS P6T WS Pro board, running FreeBSD-RELEASE 8.2 and using zfs to manage 12 WD20EARS disks. Since our 3ware card has been giving us trouble we started using the six on-board SATA connectors and got a SuperMicro USAS2-L8i to provide eight more ports. Mechanically, the card is an awkward fit but electrically it all seems ok. Upon boot, the LSI controller shows up and states that pressing ctrl-c will bring up the LSI Config Utility. When doing that, the message changes to state that the utility will be started after initialization, however that never happens. There does seem to be an error message that's only displayed too briefly to read and seems to be about PCI and "not enough space". (That message is pushed off by a hardware summary and I've found no way to scroll back at this point.) The disks do not show up in any recognizable ways after booting, either. I found a hint in another discussion to check the address mapping on either the card or the motherboard BIOS, but have found no way to do that. So what I tried on a hunch is to disable everything that's on-board, including network adapters, Firewire controller and SATA. In fact, after doing that, I can successfully launch the LSI Config Utility. As far as I can tell, all looks well in there, and when booting in that configuration it also displays a list of the disks connected to it, which looks just fine as well. Only problem now is that I can't boot that way, because I need the on-board SATA controller and network adapters. As soon as I re-enable any of them I'm back to square one. That discussion I mentioned about mapping addresses said to try D000, then D7FF, then DFFF, in order. The LSI Config Utility shows the card address as D000 but offers no way of changing it. Any tips or insights would be appreciated.

    Read the article

  • does my machine configuration make sense?

    - by user1227914
    i couldn't think of a better place to ask this question, so here it goes. we're putting together a dedicated server for a website that will initially host the web server and the mysql database. as the website grows, we'll move the database to a different server and this machine will eventually only server the actual website. so the question is ...does my configuration look okay? it's the first time i'm building a server from scratch so i want to make sure i don't combine components that don't fit or something. things like ..do the drives i picked work for the hot swap ..etc. what do you guys think? am i good to go with this configuration? :) Chassis: Supermicro SuperServer 6016T-MTHF (6x DDR3 SDRAM - ECC DIMM 240-pin, 2x LGA1366 Socket, Power Provided: 600 Watt, 4 (free) x hot-swap - 3.5") CPU: Intel BX80614E5620 Xeon E5620 Processor - 4 Core, 2.40GHz, LGA 1366, 5.86GT/s QPI 12MB Cache, 64-Bit, 80W, HyperThreading Memory: Crucial CT51272BB1339 4GB PC10600 DDR3 Memory - 1333MHz, ECC, Registered, 1x4096MB (possibly 3 or 4 of them) Hard Drives: Western Digital WD2002FAEX Caviar Black Hard Drive - 2TB, 3.5", SATA 6Gbps, 7200 RPM, 64MB (possibly 2 or 3). thank you very much for any professional advice :)

    Read the article

  • Is current SATA 6 gb/s equipment simply unreliable?

    - by korkman
    I have a 45-disk array of Seagate Barracuda 3 TB ST3000DM001 (yes these are desktop drives I'm aware of that) in a Supermicro sc847 JBOD, connected via LSI 9285. I have found a solution for the problem description below by reducing speed via MegaCli -PhySetLinkSpeed -phy0 2 -a0; for i in $(seq 48); do MegaCli -PhySetLinkSpeed -phy${i} 2 -a0; done and rebooting. The question remains: Is this typical for current 6 gb/s equipment? Is this the sad state of SATA storage? Or is some of my equipment (the sff-8088 cables come to mind) bad? The Problem was: Synchronizing HW RAID-6, disks kept offlining. Fetching SMART values reveiled that those which offlined did not increase powered-on hours anymore. That is, their firmware (CC4C) seems to crash. Digging into the matter by switching to Software RAID-6, with the disks passed-through, I got tons of kernel messages scattered across all disks, with 6 gb/s: sd 0:0:9:0: [sdb] Sense Key : No Sense [current] Info fld=0x0 sd 0:0:9:0: [sdb] Add. Sense: No additional sense information And finally, when a disk offlines: megasas: [ 5]waiting for 160 commands to complete ... megasas: [35]waiting for 159 commands to complete ... megasas: [155]waiting for 156 commands to complete ... megaraid_sas: pending commands remain after waiting, will reset adapter. Ugly controller reset here, then minutes later: megaraid_sas: Reset successful. sd 0:0:28:0: Device offlined - not ready after error recovery ... sd 0:0:28:0: [sdu] Unhandled error code sd 0:0:28:0: [sdu] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK sd 0:0:28:0: [sdu] CDB: Read(10): 28 00 23 21 2f 40 00 00 70 00 sd 0:0:28:0: [sdu] killing request Reduced speed to 3 gb/s like written above, all problems vanished.

    Read the article

  • Data Store/Volume disconnecting. How to resume copy of VMDK?

    - by Serge
    I'm having an issue with my ESXi 4.1 hosts losing the datastore with FC SAN after a power outage. All 3 hosts disconnect so it's definitely a SAN issue. I've tried to resolve the issue on the SAN side with the SAN software support and Adaptec hardware support. No luck there. So I'm stuck with a SAN that will randomly disconnect the volume. I need to get the virtual machines (VMDK files) from the datastore. The problem is I can only get 5-20% before the data store disconnects. I have backups that are slightly older that I can use to replicate the VMDK differences to. What has not worked so far: Powering up the VMs, will boot up for 5-15 minutes then freeze vCenter migrate or clone of VM, will fail after similar period of time vCenter copy/paste of VMDK. Was able to get one 30GB VMDK and no luck after that. vMware Data Recovery. Fails at low %, can't resume, so next backup starts from begining. Veeam Backup & Recovery. Same as above, no resume function. If I can just find a backup solution that will resume from the failed spot that would solve my issue. Anyone have any ideas that I could try? EDIT 1 The SAN is Open-E DSS 6 running on a Supermicro 24 drive enclosure with 4 port Qlogic FC. Adaptec 52445 RAID card.

    Read the article

  • Building vs buying a server for an academic lab [closed]

    - by Roy
    I'm looking for advice on the classic build vs buy question. We need a new linux server to run Matlab computation on in our lab (academic). Matlab parallel computing toolbox licence allows up to 12 local workers so we are aiming at a 12 core server with 4GB memory per core (total of 48gb). The system will have an SSD for the OS and a raid-5 (4x2tb) for data. I looked around and found a (relatively) cheap vendor, Silicon Mechanics, that offers a system to our liking (specs below) for $6732. However, buying the components from newegg cost only $4464! The difference is $2268 which is 50% of the base cost. If buying from a company can be thought of as a sort of insurance, basically my premiums are of 50% of the base cost which to me sounds like a lot. Of course any downtime is bad, but the work is not "mission critical", i.e. if it takes a few days to fix it when it breaks its no the end of the world. If it takes weeks to months then its a problem. If it breaks 2-3 times in 3 years, not too bad. If it breaks every month not good. In term of build experience, I set up a linux cluster in grad school (from existing computers) and I build my home pcs but I never built a server before. The server components I'm thinking about: 1 x SUPERMICRO SYS-7046T-6F 4U Tower Server Barebone Dual LGA 1366 Intel 5520 DDR3 1333/1066/800 ($1,050) 12 x Kingston 4GB 240-Pin DDR3 SDRAM DDR3 1333 (PC3 10600) ECC Unbuffered Server Memory ($420) 2 x Intel Xeon E5645 Westmere-EP 2.4GHz LGA 1366 80W Six-Core ($1,116) 4 x Seagate Constellation ES 2TB 7200 RPM SATA 6.0Gb/s 3.5" ($1,040) 1 x SAMSUNG Internal DVD Writer Black SATA ($20) 1 x Intel 520 Series 2.5" 180GB SATA III MLC SSD $300 1 x LSI LSI00281 PCI-Express 2.0 x8 MD2 Low profile SATA / SAS MegaRAID SAS 9260CV-4i Controller Card, $695

    Read the article

  • HPET missing from available clocksources on CentOS

    - by squareone
    I am having trouble using HPET on my physical machine. It is not available, even though I have enabled it in my bios, forced it in grub, and triple checked my kernel to include HPET in its compilation. Motherboard: Supermicro X9DRW Processor: 2x Intel(R) Xeon(R) CPU E5-2640 SAS Controller: LSI Logic / Symbios Logic SAS2004 PCI-Express Fusion-MPT SAS-2 [Spitfire] (rev 03) Distro: CentOS 6.3 Kernel: 3.4.21-rt32 #2 SMP PREEMPT RT x86_64 GNU/Linux Grub: hpet=force clocksource=hpet .config file: CONFIG_HPET_TIMER=y CONFIG_HPET_EMULATE_RTC=y CONFIG_HPET=y dmesg | grep hpet: Command line: ro root=/dev/mapper/vg_xxxx-lv_root rd_NO_LUKS rd_LVM_LV=vg_xxxx/lv_root KEYBOARDTYPE=pc KEYTABLE=us rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=auto rd_LVM_LV=vg_xxxx/lv_swap rd_NO_DM LANG=en_US.UTF-8 rhgb quiet panic=5 hpet=force clocksource=hpet Kernel command line: ro root=/dev/mapper/vg_xxxx-lv_root rd_NO_LUKS rd_LVM_LV=vg_xxxx/lv_root KEYBOARDTYPE=pc KEYTABLE=us rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=auto rd_LVM_LV=vg_xxxx/lv_swap rd_NO_DM LANG=en_US.UTF-8 rhgb quiet panic=5 hpet=force clocksource=hpet cat /sys/devices/system/clocksource/clocksource0/current_clocksource: tsc cat /sys/devices/system/clocksource/clocksource0/available_clocksource: tsc jiffies What is even more confusing, is that I have about a dozen other machines that utilize the same kernel .config, and can use HPET fine. I fear it is a hardware issue, but would appreciate any advice or help with getting HPET available. Thanks in advance!

    Read the article

  • Windows Displays Double the Actual Installed Physical Memory

    - by Andrew Barber
    I have a server I've installed Windows Web 2008 R2 on, which is reporting that I have double the physical memory installed as is actually the case. In msinfo32 "Installed Physical Memory" shows as 2x what ever the actual installed amount is, though "Total Physical Memory" shows the correct amount. The "System" info window shows installed memory as 2x, with the correct amount in parenthesis listed as the "usable" amount). This server mistakenly had Windows Web 2008 (32-bit) installed on it just previously, and that OS also reported the same faulty information as Win2K8R2 is reporting. BIOS reports the correct amount, memtest was run on this server before installation, and a previous Windows 2000 instance installed on this system also reported the correct amount, as I recall. Server operation seems to be fine as well (it's only trying to use the correct amount of memory). The server is a generic pizzabox running on a SuperMicro X6DVL-EG with dual Xeon-3.2's. Memory installed are 4 matching mt18vddf12872g-335c3 sticks (1GB pc2700 DDR ECC REG cl2.5) This behavior occurs whether two or all four are installed. So, has anyone seen something like this before? Have any idea about what's causing it, and how I should be concerned about it? Everything else seems good so far, and I'll be upgrading the memory before putting the server into service, but I don't want to spend too much time/money/effort on the server if it's got something odd going wrong here. UPDATE: There was a question I ran into regarding memory sparing in the BIOS and a possible (buggy) effect thereof; however, flipping that bit back and forth in the BIOS revealed that isn't the issue. Still flummoxed a bit about this one, though I still have seen no negative impacts. Post-Answer Update (January 13, 2011): Upgrading the system with new, larger memory has fixed this issue.

    Read the article

  • Bad motherboard / controller / HDs?

    - by quidpro
    On a leased server, I am running into some timing issues with an application that requires precise timing. Server is a Dual Xeon E5410 running on a Supermicro X7DVL-3 motherboard under CentOs 5.5 x64. The application I am running is timer sensitive and keeps sensing drift whether under load or at idle, but especially under load. I did some investigating with atop and dd and found some mind-blowing numbers. Mind you, I am no Linux guru but something sure seems out of whack. I ran: dd bs=4096 if=/dev/zero of=/bigtestfile to generate disk activity. Regardless whether I wrote it to sda or sdb my DSK value in atop would go over 100%, at one time peaking at 1700%. Again it does not matter if I am writing to sda or sdb. DSK | sdb | busy 675% | read 0 | write 110 | avio 78 ms | Here are the smartctl outputs: # smartctl -A /dev/sda smartctl version 5.38 [x86_64-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF READ SMART DATA SECTION === SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0007 165 165 021 Pre-fail Always - 2750 4 Start_Stop_Count 0x0032 100 100 040 Old_age Always - 21 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x000a 200 200 051 Old_age Always - 0 9 Power_On_Hours 0x0032 065 065 000 Old_age Always - 25831 10 Spin_Retry_Count 0x0012 100 253 051 Old_age Always - 0 11 Calibration_Retry_Count 0x0012 100 253 051 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 21 194 Temperature_Celsius 0x0022 116 093 000 Old_age Always - 27 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0012 200 200 000 Old_age Always - 0 199 UDMA_CRC_Error_Count 0x000a 200 253 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 051 Old_age Offline - 0 # smartctl -A /dev/sdb smartctl version 5.38 [x86_64-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF READ SMART DATA SECTION === SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0003 180 180 021 Pre-fail Always - 3958 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 22 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 200 200 051 Pre-fail Always - 0 9 Power_On_Hours 0x0032 068 068 000 Old_age Always - 24087 10 Spin_Retry_Count 0x0013 100 253 051 Pre-fail Always - 0 11 Calibration_Retry_Count 0x0013 100 253 051 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 21 194 Temperature_Celsius 0x0022 122 096 000 Old_age Always - 25 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0009 200 200 051 Pre-fail Offline - 0 Any idea what's wrong here? Bad motherboard? It would seem rare that both drives are going bad (smartctl says they PASS_, so it leaves the mobo as the culprit in my eyes.

    Read the article

  • openstack, bridging, netfilter and dnat

    - by Craig Sanders
    In a recent upgrade (from Openstack Diablo on Ubuntu Lucid to Openstack Essex on Ubuntu Precise), we found that DNS packets were frequently (almost always) dropped on the bridge interface (br100). For our compute-node hosts, that's a Mellanox MT26428 using the mlx4_en driver module. We've found two workarounds for this: Use an old lucid kernel (e.g. 2.6.32-41-generic). This causes other problems, in particular the lack of cgroups and the old version of the kvm and kvm_amd modules (we suspect the kvm module version is the source of a bug we're seeing where occasionally a VM will use 100% CPU). We've been running with this for the last few months, but can't stay here forever. With the newer Ubuntu Precise kernels (3.2.x), we've found that if we use sysctl to disable netfilter on bridge (see sysctl settings below) that DNS started working perfectly again. We thought this was the solution to our problem until we realised that turning off netfilter on the bridge interface will, of course, mean that the DNAT rule to redirect VM requests for the nova-api-metadata server (i.e. redirect packets destined for 169.254.169.254:80 to compute-node's-IP:8775) will be completely bypassed. Long-story short: with 3.x kernels, we can have reliable networking and broken metadata service or we can have broken networking and a metadata service that would work fine if there were any VMs to service. We haven't yet found a way to have both. Anyone seen this problem or anything like it before? got a fix? or a pointer in the right direction? Our suspicion is that it's specific to the Mellanox driver, but we're not sure of that (we've tried several different versions of the mlx4_en driver, starting with the version built-in to the 3.2.x kernels all the way up to the latest 1.5.8.3 driver from the mellanox web site. The mlx4_en driver in the 3.5.x kernel from Quantal doesn't work at all) BTW, our compute nodes have supermicro H8DGT motherboards with built-in mellanox NIC: 02:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev b0) we're not using the other two NICs in the system, only the Mellanox and the IPMI card are connected. Bridge netfilter sysctl settings: net.bridge.bridge-nf-call-arptables = 0 net.bridge.bridge-nf-call-iptables = 0 net.bridge.bridge-nf-call-ip6tables = 0 Since discovering this bridge-nf sysctl workaround, we've found a few pages on the net recommending exactly this (including Openstack's latest network troubleshooting page and a launchpad bug report that linked to this blog-post that has a great description of the problem and the solution)....it's easier to find stuff when you know what to search for :), but we haven't found anything on the DNAT issue that it causes.

    Read the article

  • SAS Expanders vs Direct Attached (SAS)?

    - by jemmille
    I have a storage unit with 2 backplanes. One backplane holds 24 disks, one backplane holds 12 disks. Each backplane is independently connected to a SFF-8087 port (4 channel/12Gbit) to the raid card. Here is where my question really comes in. Can or how easily can a backplane be overloaded? All the disks in the machine are WD RE4 WD1003FBYX (black) drives that have average writes at 115MB/sec and average read of 125 MB/sec I know things would vary based on the raid or filesystem we put on top of that but it seems to be that a 24 disk backplane with only one SFF-8087 connector should be able to overload the bus to a point that might actually slow it down? Based on my math, if I had a RAID0 across all 24 disks and asked for a large file, I should, in theory should get 24*115 MB/sec wich translates to 22.08 GBit/sec of total throughput. Either I'm confused or this backplane is horribly designed, at least in a perfomance environment. I'm looking at switching to a model where each drive has it's own channel from the backplane (and new HBA's or raid card). EDIT: more details We have used both pure linux (centos), open solaris, software raid, hardware raid, EXT3/4, ZFS. Here are some examples using bonnie++ 4 Disk RAID-0, ZFS WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS 194MB/s 19% 92MB/s 11% 200MB/s 8% 310/sec 194MB/s 19% 93MB/s 11% 201MB/s 8% 312/sec --------- ---- --------- ---- --------- ---- --------- 389MB/s 19% 186MB/s 11% 402MB/s 8% 311/sec 8 Disk RAID-0, ZFS WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS 324MB/s 32% 164MB/s 19% 346MB/s 13% 466/sec 324MB/s 32% 164MB/s 19% 348MB/s 14% 465/sec --------- ---- --------- ---- --------- ---- --------- 648MB/s 32% 328MB/s 19% 694MB/s 13% 465/sec 12 Disk RAID-0, ZFS WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS 377MB/s 38% 191MB/s 22% 429MB/s 17% 537/sec 376MB/s 38% 191MB/s 22% 427MB/s 17% 546/sec --------- ---- --------- ---- --------- ---- --------- 753MB/s 38% 382MB/s 22% 857MB/s 17% 541/sec Now 16 Disk RAID-0, it's gets interesting WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS 359MB/s 34% 186MB/s 22% 407MB/s 18% 1397/sec 358MB/s 33% 186MB/s 22% 407MB/s 18% 1340/sec --------- ---- --------- ---- --------- ---- --------- 717MB/s 33% 373MB/s 22% 814MB/s 18% 1368/sec 20 Disk RAID-0, ZFS WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS 371MB/s 37% 188MB/s 22% 450MB/s 19% 775/sec 370MB/s 37% 188MB/s 22% 447MB/s 19% 797/sec --------- ---- --------- ---- --------- ---- --------- 741MB/s 37% 376MB/s 22% 898MB/s 19% 786/sec 24 Disk RAID-1, ZFS WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS 347MB/s 34% 193MB/s 22% 447MB/s 19% 907/sec 347MB/s 34% 192MB/s 23% 446MB/s 19% 933/sec --------- ---- --------- ---- --------- ---- --------- 694MB/s 34% 386MB/s 22% 894MB/s 19% 920/sec 28 Disk RAID-0, ZFS 32 Disk RAID-0, ZFS 36 Disk RAID-0, ZFS More details: Here is the exact unit: http://www.supermicro.com/products/chassis/4U/847/SC847E1-R1400U.cfm

    Read the article

  • How to make Linux reliably boot on multi-cpu machines?

    - by Adam Tabi
    I've got two machines, one with 4x12 AMD Opteron cores (AMD Opteron(tm) Processor 6176), one with 2x8 Xeon cores (HT disabled; Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz). On both machines I experience difficulties during boot of Linux using recent kernels. The system hangs during the initialization of the kernel, before or just when initramfs started initializing the hardware. The last thing which got displayed was a stacktrace like this: CPU: 31 PID: 0 Comm: swapper/31 Tainted: G D 3.11.6-hardened #11 Hardware name: Supermicro X9DRT-HF+/X9DRT-HF+, BIOS 3.00 07/08/2013 task: ffff880854695500 ti: ffff880854695a28 task.ti: ffff880854695a28 RIP: 0010:[<ffffffff8100a82e>] [<ffffffff8100a82e>] default_idle+0x6/0xe RSP: 0000:ffff8808546b3ec8 EFLAGS: 00000286 RAX: ffffffff8100a828 RBX: ffff880854695a28 RCX: 00000000ffffffff RDX: 0100000000000000 RSI: 0000000000000000 RDI: ffff88107fdec690 RBP: ffff8808546b3ec8 R08: 0000000000000000 R09: ffff880854695500 R10: ffff880854695500 R11: 0000000000000001 R12: ffff880854695a28 R13: ffff880854695a28 R14: ffff880854695a28 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88107fde0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000002b43256a960 CR3: 00000000016b5000 CR4: 00000000000607f0 Stack: ffff8808546b3ed8 ffffffff8100aec9 ffff8808546b3f10 ffffffff8109ce25 334ab55852ec7aef 000000000000001f ffffffff8102d6c0 0000000000000000 0000000000000000 ffff8808546b3f48 ffffffff810276e0 ffff8808546b3f28 Call Trace: [<ffffffff8100aec9>] arch_cpu_idle+0x20/0x2b [<ffffffff8109ce25>] cpu_startup_entry+0xed/0x138 [<ffffffff8102d6c0>] ? flat_init_apic_ldr+0x80/0x80 [<ffffffff810276e0>] start_secondary+0x2c9/0x2f8 I compiled the kernel myself and it works fine, if I boot with nolapic. Yet, only one core is used. Also, the kernel of RHEL6 seems to work fine. I suspect that there are some patches used to make things work. Using the kernel config file from RHEL6 and building a more recent kernel yields the same problems. On the Xeon machine, things got better by disabling Hyperthreading completely. The machine now boots successfully on at least 4 out of 5 times. And if it boots, multicore stuff works just fine. However, I'm wondering about what to do about the AMD machine. So to sum it up: Gentoo kernel 3.6 - 3.11 won't reliably boot those machines unless you reduce the amount of cores (e.g. via nolapic). RHEL6 kernel (which is 2.6.32) boots just fine. RH kernel config used to build a 3.x kernel won't yield a working kernel. Not distribution specific (apart from the kernel being used). These stack traces got printed every minute or so. The kernel seems to be stuck in an endless loop. Yet, a recent kernel is needed for various reasons. So the question is: What does the RHEL6 kernel do, what vanilla or gentoo kernels don't do? Is there a boot option that might lead to a reliable boot with all the cores enabled? Best, Adam

    Read the article

  • Are FC and SAS DAS devices standard enough?

    - by user222182
    Before I ask my questions, here is some background info that may or may not be useful: For the first time I find myself needing a DAS solution. My priority is data through-put in a single direction. I can write large blocks, and I don't need to read at the same time. The server (the data producing device) is not really a typical server, its a very powerful single board computer. As such I have limited options when it comes to the add-in cards I can install since it must use the fairly uncommon interface, XMC. Currently I believe I am limited PCIex8 gen 1 which means that the likely bottle neck for me will be this 16gbps connection. XMC Boards I have found so far offer the following connections: a) Dual 10GBE ethernet controller, total throughput 20gbps b) Dual Quad SAS 2.0 Connectors (SFF-8XXX) HBA (no raid), total throughput 48 gbps c) Dual FC 8gb HBA (no raid), total throughput 16gbps My questions for you guys are: 1) Are SAS and/or FC, and by extension their HBAs, standard enough that I could purchase a Dell or Aberdeen storage server with a raid controller that has external SAS or FC ports and expect that I can connect it to my SAS or FC HBA, be presented with a single volume (if I so configured the storage server), all without having to check for HBA compatibility? 2) On a device like a Dell PowerVault (either DAS or NAS) is there an OS on it to concern myself with, or is it meant to be remotely managed? Is there a local interface in case I cant remotely manage it (i.e. if my single board computer uses an OS not supported by Dell OpenManage). Would this be true of nearly any device which calls itself a DAS? 3) If I purchase some sort of Supermicro storage chassis, installed a raid controller with external connections, is there a nice lightweight OS I can run just for management of the controller? Would I even need an OS since the raid card would be configured pre-boot anyway? 4) It is much easier to buy XMC based 10gigabit ethernet cards (generally dual port). In what ways would I be getting into trouble by using iSCSI as a DAS are direct cabling with SFP+ cables? Thanks in advance

    Read the article

  • Xen kernel can't see 2 disks of 6 of 1TB, does it have a limitation?

    - by PartySoft
    Linux gentoo-xen 2.6.18-xen-r12 #3 SMP Tue Oct 5 09:28:53 PDT 2010 x86_64 Intel(R) Xeon(R) CPU E5506 @ 2.13GHz GenuineIntel GNU/Linux I have 6 disks of 1 TB and i can't see all of them only 4, can anyone give me an ideea what can i do ? Filesystem Size Used Avail Use% Mounted on rootfs 886G 4.4G 836G 1% / /dev/sda3 886G 4.4G 836G 1% / rc-svcdir 1.0M 44K 980K 5% /lib64/rc/init.d shm 7.9G 0 7.9G 0% /dev/shm /dev/sdb1 917G 200M 871G 1% /home2 /dev/sdc1 917G 200M 871G 1% /home3 /dev/sdd1 917G 200M 871G 1% /home4 The hardware is Dual xeon E5506 processors on a supermicro X8DTL mobo 4.346585] ata3.00: ATA-8, max UDMA/133, 1953525168 sectors: LBA48 NCQ (depth 0/32) [ 4.346588] ata3.00: ata3: dev 0 multi count 16 [ 4.352861] ata3.00: configured for UDMA/133 [ 4.352867] scsi3 : ata_piix [ 4.352875] PM: Adding info for No Bus:host3 [ 4.510584] ata4.00: ATA-8, max UDMA/133, 1953525168 sectors: LBA48 NCQ (depth 0/32) [ 4.510587] ata4.00: ata4: dev 0 multi count 16 [ 4.516848] ata4.00: configured for UDMA/133 [ 4.516861] PM: Adding info for No Bus:target2:0:0 [ 4.516905] Vendor: ATA Model: SAMSUNG HD103SJ Rev: 1AJ1 [ 4.516910] Type: Direct-Access ANSI SCSI revision: 05 [ 4.516920] PM: Adding info for scsi:2:0:0:0 [ 4.517452] SCSI device sde: 1953525168 512-byte hdwr sectors (1000205 MB) [ 4.517460] sde: Write Protect is off [ 4.517461] sde: Mode Sense: 00 3a 00 00 [ 4.517478] SCSI device sde: drive cache: write back [ 4.517514] SCSI device sde: 1953525168 512-byte hdwr sectors (1000205 MB) [ 4.517521] sde: Write Protect is off [ 4.517522] sde: Mode Sense: 00 3a 00 00 [ 4.517532] SCSI device sde: drive cache: write back [ 4.517534] sde: sde1 [ 4.524551] sd 2:0:0:0: Attached scsi disk sde [ 4.524855] sd 2:0:0:0: Attached scsi generic sg4 type 0 [ 4.524874] PM: Adding info for No Bus:target3:0:0 [ 4.524928] Vendor: ATA Model: SAMSUNG HD103SJ Rev: 1AJ1 [ 4.524933] Type: Direct-Access ANSI SCSI revision: 05 [ 4.524946] PM: Adding info for scsi:3:0:0:0 [ 4.525216] SCSI device sdf: 1953525168 512-byte hdwr sectors (1000205 MB) [ 4.525227] sdf: Write Protect is off [ 4.525228] sdf: Mode Sense: 00 3a 00 00 [ 4.525242] SCSI device sdf: drive cache: write back [ 4.525280] SCSI device sdf: 1953525168 512-byte hdwr sectors (1000205 MB) [ 4.525286] sdf: Write Protect is off [ 4.525289] sdf: Mode Sense: 00 3a 00 00 [ 4.525301] SCSI device sdf: drive cache: write back [ 4.525302] sdf: sdf1 [ 4.532691] sd 3:0:0:0: Attached scsi disk sdf [ 4.533010] sd 3:0:0:0: Attached scsi generic sg5 type 0 [ 4.977669] scsi: <fdomain> Detection failed (no card) [ 5.030479] GDT-HA: Storage RAID Controller Driver. Version: 3.05 [ 5.030635] GDT-HA: Found 0 PCI Storage RAID Controllers [ 5.372350] Fusion MPT base driver 3.04.01 [ 5.372358] Copyright (c) 1999-2005 LSI Logic Corporation [ 5.579176] Fusion MPT SPI Host driver 3.04.01 [ 5.881777] ieee1394: Initialized config rom entry `ip1394' [ 6.166745] ieee1394: sbp2: Driver forced to serialize I/O (serialize_io=1) [ 6.166748] ieee1394: sbp2: Try serialize_io=0 for better performance [ 6.428866] md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27 [ 6.428872] md: bitmap version 4.39 [ 6.431518] md: raid0 personality registered for level 0 [ 6.495979] md: raid1 personality registered for level 1 [ 6.570270] raid5: automatically using best checksumming function: generic_sse [ 6.575523] generic_sse: 6608.000 MB/sec [ 6.575526] raid5: using function: generic_sse (6608.000 MB/sec) [ 6.596226] raid6: int64x1 1835 MB/s [ 6.613231] raid6: int64x2 1773 MB/s [ 6.630256] raid6: int64x4 1675 MB/s [ 6.647296] raid6: int64x8 1027 MB/s [ 6.664267] raid6: sse2x1 3578 MB/s [ 6.681268] raid6: sse2x2 4207 MB/s [ 6.698280] raid6: sse2x4 4625 MB/s [ 6.698281] raid6: using algorithm sse2x4 (4625 MB/s) [ 6.698285] md: raid6 personality registered for level 6 [ 6.698286] md: raid5 personality registered for level 5 [ 6.698288] md: raid4 personality registered for level 4 [ 6.781090] md: raid10 personality registered for level 10 [ 7.007043] Intel(R) PRO/1000 Network Driver - version 7.1.9-k4 [ 7.007046] Copyright (c) 1999-2006 Intel Corporation. [ 9.229465] kjournald starting. Commit interval 5 seconds [ 9.229476] EXT3-fs: mounted filesystem with ordered data mode.

    Read the article

  • How can a single disk in a hardware SATA RAID-10 array bring the entire array to a screeching halt?

    - by Stu Thompson
    Prelude: I'm a code-monkey that's increasingly taken on SysAdmin duties for my small company. My code is our product, and increasingly we provide the same app as SaaS. About 18 months ago I moved our servers from a premium hosting centric vendor to a barebones rack pusher in a tier IV data center. (Literally across the street.) This ment doing much more ourselves--things like networking, storage and monitoring. As part the big move, to replace our leased direct attached storage from the hosting company, I built a 9TB two-node NAS based on SuperMicro chassises, 3ware RAID cards, Ubuntu 10.04, two dozen SATA disks, DRBD and . It's all lovingly documented in three blog posts: Building up & testing a new 9TB SATA RAID10 NFSv4 NAS: Part I, Part II and Part III. We also setup a Cacit monitoring system. Recently we've been adding more and more data points, like SMART values. I could not have done all this without the awesome boffins at ServerFault. It's been a fun and educational experience. My boss is happy (we saved bucket loads of $$$), our customers are happy (storage costs are down), I'm happy (fun, fun, fun). Until yesterday. Outage & Recovery: Some time after lunch we started getting reports of sluggish performance from our application, an on-demand streaming media CMS. About the same time our Cacti monitoring system sent a blizzard of emails. One of the more telling alerts was a graph of iostat await. Performance became so degraded that Pingdom began sending "server down" notifications. The overall load was moderate, there was not traffic spike. After logging onto the application servers, NFS clients of the NAS, I confirmed that just about everything was experiencing highly intermittent and insanely long IO wait times. And once I hopped onto the primary NAS node itself, the same delays were evident when trying to navigate the problem array's file system. Time to fail over, that went well. Within 20 minuts everything was confirmed to be back up and running perfectly. Post-Mortem: After any and all system failures I perform a post-mortem to determine the cause of the failure. First thing I did was ssh back into the box and start reviewing logs. It was offline, completely. Time for a trip to the data center. Hardware reset, backup an and running. In /var/syslog I found this scary looking entry: Nov 15 06:49:44 umbilo smartd[2827]: Device: /dev/twa0 [3ware_disk_00], 6 Currently unreadable (pending) sectors Nov 15 06:49:44 umbilo smartd[2827]: Device: /dev/twa0 [3ware_disk_07], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 171 to 170 Nov 15 06:49:45 umbilo smartd[2827]: Device: /dev/twa0 [3ware_disk_10], 16 Currently unreadable (pending) sectors Nov 15 06:49:45 umbilo smartd[2827]: Device: /dev/twa0 [3ware_disk_10], 4 Offline uncorrectable sectors Nov 15 06:49:45 umbilo smartd[2827]: Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error Nov 15 06:49:45 umbilo smartd[2827]: # 1 Short offline Completed: read failure 90% 6576 3421766910 Nov 15 06:49:45 umbilo smartd[2827]: # 2 Short offline Completed: read failure 90% 6087 3421766910 Nov 15 06:49:45 umbilo smartd[2827]: # 3 Short offline Completed: read failure 10% 5901 656821791 Nov 15 06:49:45 umbilo smartd[2827]: # 4 Short offline Completed: read failure 90% 5818 651637856 Nov 15 06:49:45 umbilo smartd[2827]: So I went to check the Cacti graphs for the disks in the array. Here we see that, yes, disk 7 is slipping away just like syslog says it is. But we also see that disk 8's SMART Read Erros are fluctuating. There are no messages about disk 8 in syslog. More interesting is that the fluctuating values for disk 8 directly correlate to the high IO wait times! My interpretation is that: Disk 8 is experiencing an odd hardware fault that results in intermittent long operation times. Somehow this fault condition on the disk is locking up the entire array Maybe there is a more accurate or correct description, but the net result has been that the one disk is impacting the performance of the whole array. The Question(s) How can a single disk in a hardware SATA RAID-10 array bring the entire array to a screeching halt? Am I being naïve to think that the RAID card should have dealt with this? How can I prevent a single misbehaving disk from impacting the entire array? Am I missing something?

    Read the article

< Previous Page | 1 2 3