ewwhite - Page 2 - Developer IT

CentOS 5.5 remote kickstart installation stalls at "Starting install process." How to debug?

- by ewwhite

Hello, I'm having a difficult time with a remote CentOS 5.5 kickstart installation on an HP ProLiant DL360 G6. This is in an environment where I maintain an internal CentOS yum repository. The kickstart installation and post scripts have been tested and normally work. This hardware is also common in this environment, so I do not believe that it is a factor. Unfortunately, I'm having problems with a specific server install. The system is remote to the yum repository at a distance of 500 miles. They are connected over a private low-latency 100-megabit layer 2 connection (26ms round-trip). I'm mounting the 10mb CentOS 5 netinstall ISO image via an HP ILO remote console. The initial boot parameters are: linux ks=http://yum.abctrading.com/prop.cfg ksdevice=eth0 ip=x.x.x.x dns=x.x.x.x netmask=255.255.255.0 gateway=x.x.x.x I'm using the url --url http://ks.abctrading.com/5.5/os/x86_64/ method of installation. This quickly boots into the anaconda installer, pulls the kickstart config and formats the drives. The process eventually halts at the screen below, reading "Starting install process.". Going to the other virtual consoles give the second image below. The process stalls at this point and cannot proceed with the rest of the installation. Running the same kickstart config locally works just fine. I've tried mounting the boot ISO from the console as well as from the ILO2 command line pointing to a locally-hosted boot ISO via http. How can I debug this? Are there any options I've overlooked?

Read the article

Four disks - RAID 10 or two mirrored pairs?

- by ewwhite

I have this discussion with developers quite often. The context is an application running in Linux that has a medium amount of disk I/O. The servers are HP ProLiant DL3x0 G6 with four disks of equal size @ 15k rpm, backed with a P410 controller and 512MB of battery or flash-based cache. There are two schools of thought here, and I wanted some feedback... 1). I'm of the mind that it makes sense to create an array containing all four disks set up in a RAID 10 (1+0) and partition as necessary. This gives the greatest headroom for growth, has the benefit of leveraging the higher spindle count and better fault-tolerance without degradation. 2). The developers think that it's better to have multiple RAID 1 pairs. One for the OS and one for the application data, citing that the spindle separation would reduce resource contention. However, this limits throughput by halving the number of drives and in this case, the OS doesn't really do much other than regular system logging. Additionally, the fact that we have the battery RAID cache and substantial RAM seems to negate the impact of disk latency... What are your thoughts?

Read the article

Nested RDP and ILO sessions, latency and keystroke repetition.

- by ewwhite

I'm working on a remote server installation entirely through ILO. Due to the software application and environment, my access is restricted to a Windows server that I must access through RDP. Going from that system to the target server is accomplished via HP ILO3. I'm trying to run a CentOS installation in an environment where I can't use a kickstart. I'm doing this via text mode, but the keystrokes are repeating randomly and it's difficult to select the proper installation options. I'm doing this using Microsoft's native RDP client (on Mac and Windows). I've also noticed this before when running installations or doing remote work in nested sessions. Is there a nice fix for this, or it it simply a function of the protocol?

Read the article

Is my Cisco switch port bad?

- by ewwhite

I've been chasing a packet-loss and network stability issue for a handful of end-users on an internal network for the past few days... These issues surfaced last week, however the location was struck by lightning six weeks ago. I was seeing 5-10% packet loss between a stack of four Cisco 2960's and several PC's and phones on the other side of a 77-meter run. The PC's were run inline with the phones over a trunked link (switchport configuration pastebin). We were seeing dropped calls and interruptions in client-server applications and Microsoft Exchange connectivity. I tried the usual troubleshooting steps remotely, having a local technician do the following during breaks in user and production activity: change cables between the wall jack and device. change patch cables between the patch panel and switch port(s). try different switch ports within the 2960 stack. change end-user devices with known-good equipment (new phones, different PC's). clear switch port interface counters and monitor incrementing errors closely. (Pastebin output of sh int) Pored over the device logs and Observium RRD graphs. No link up/down issues from the switch side. change power strips on the end-user side. test cable runs from the Cisco 2960 using test cable-diagnostics tdr int Gi4/0/9 (clean)* test cable runs with a Tripp-Lite cable tester. (clean) run diagnostics on the switch stack members. (clean) In the end, it took three changes of switch ports to find a stable solution. The only logical conclusion is that a few Cisco 2960 switch ports are bad or flaky... Not dead, but not consistent in behavior either. I'm not used to seeing individual ports die in this manner. What else can I test or check to determine if these devices are bad? Is it common for single ports to have problems, rather than a contiguous bank of ports? BTW - show cable-diagnostics tdr int Gi4/0/14 is very cool... Interface Speed Local pair Pair length Remote pair Pair status --------- ----- ---------- ------------------ ----------- -------------------- Gi4/0/14 1000M Pair A 79 +/- 0 meters Pair B Normal Pair B 75 +/- 0 meters Pair A Normal Pair C 77 +/- 0 meters Pair D Normal Pair D 79 +/- 0 meters Pair C Normal

Read the article

Linux - real-world hardware RAID controller tuning (scsi and cciss)

- by ewwhite

Most of the Linux systems I manage feature hardware RAID controllers (mostly HP Smart Array). They're all running RHEL or CentOS. I'm looking for real-world tunables to help optimize performance for setups that incorporate hardware RAID controllers with SAS disks (Smart Array, Perc, LSI, etc.) and battery-backed or flash-backed cache. Assume RAID 1+0 and multiple spindles (4+ disks). I spend a considerable amount of time tuning Linux network settings for low-latency and financial trading applications. But many of those options are well-documented (changing send/receive buffers, modifying TCP window settings, etc.). What are engineers doing on the storage side? Historically, I've made changes to the I/O scheduling elevator, recently opting for the deadline and noop schedulers to improve performance within my applications. As RHEL versions have progressed, I've also noticed that the compiled-in defaults for SCSI and CCISS block devices have changed as well. This has had an impact on the recommended storage subsystem settings over time. However, it's been awhile since I've seen any clear recommendations. And I know that the OS defaults aren't optimal. For example, it seems that the default read-ahead buffer of 128kb is extremely small for a deployment on server-class hardware. The following articles explore the performance impact of changing read-ahead cache and nr_requests values on the block queues. http://zackreed.me/articles/54-hp-smart-array-p410-controller-tuning http://www.overclock.net/t/515068/tuning-a-hp-smart-array-p400-with-linux-why-tuning-really-matters http://yoshinorimatsunobu.blogspot.com/2009/04/linux-io-scheduler-queue-size-and.html For example, these are suggested changes for an HP Smart Array RAID controller: echo "noop" > /sys/block/cciss\!c0d0/queue/scheduler blockdev --setra 65536 /dev/cciss/c0d0 echo 512 > /sys/block/cciss\!c0d0/queue/nr_requests echo 2048 > /sys/block/cciss\!c0d0/queue/read_ahead_kb What else can be reliably tuned to improve storage performance? I'm specifically looking for sysctl and sysfs options in production scenarios.

Read the article

ZFS SAS/SATA controller recommendations

- by ewwhite

I've been working with OpenSolaris and ZFS for 6 months, primarily on a Sun Fire x4540 and standard Dell and HP hardware. One downside to standard Perc and HP Smart Array controllers is that they do not have a true "passthrough" JBOD mode to present individual disks to ZFS. One can configure multiple RAID 0 arrays and get them working in ZFS, but it impacts hotswap capabilities (thus requiring a reboot upon disk failure/replacement). I'm curious as to what SAS/SATA controllers are recommended for home-brewed ZFS JBODs. In addition, how does battery-backed write cache (BBWC) play into the solution?

Read the article

Rename Active Directory domain following Windows 2000 -> 2008 migration.

- by ewwhite

I'm working with a site that needs an internal DNS domain rename. It currently has a DNS name of domain.abc.com and NT name of ABC. I'm trying to get to a DNS name of abctrading.com and NT name of ABCTRADING. Split DNS would be used. The site originally ran from a single Windows 2000 domain controller hosting AD, file, print, DHCP and DNS services. There was no Exchange system in the environment. The 50 client PCs are all Windows XP with a handful of users using roaming profiles. All users are in a single OU and there are no group policy/GPOs. I'm a Linux engineer, but have been trying to guide another group of consultants to reach a more suitable setup. With the help of this group, we were able to move the single Windows 2000 system to a set of Windows 2008 R2 servers separated into domain controller and file/print systems (virtualized). We are also trying to add an Exchange 2010 system to this mix. The Windows 2000 server was demoted and is no longer in the picture. This is the tricky part, as client wants the domain renamed and the consultants aren't quite sure how to get through it without another 32-40 hours of testing/implementation. THey say that there's considerable risk to do the rename without a completely isolated test environment. However, this rename has to be done before installing Exchange. So we're stuck at this point. I'd like to know what's involved in renaming the domain at this point. We're on Windows Server 2008. The AD is healthy now. Coming from a Linux background, it seems as though there should be a reasonable path to this. Also, since the original domain appears to be a child/subdomain, would that be a problem here. I'd appreciate any guidance.

Read the article

ZFS - how to partition SSD for ZIL or L2ARC use?

- by ewwhite

I'm working with a Sun x4540 unit with two pools and newly-installed ZIL (OCZ Vertex 2 Pro) and L2ARC (Intel X25-M) devices. Since I need to keep these two pools in the near-term, I'd like to know how to partition these devices to serve both pools of data. I've tried format, parted and fdisk and can't quite seem to get the right combination to generate recognizable partitions for zpool add. The OS in this case is NexentaStor, but I will also need this for general OpenSolaris solutions.

Read the article

NIC reordering on RHEL5/CentOS 5

- by ewwhite

I have an HP ProLiant DL360 G6 containing two onboard NICs as well as an HP NC375T (NetXen NX3031 chipset) 4-port PCIe card. The system was running with eth0 and eth1 belonging to the onboard NICs and eth2-eth5 on the NetXen card. I recently rebuilt the server and from the kickstart process onward, the NICs were reordered such that the onboard NICs became eth4 and eth5, while the NetXen card took over eth0-eth3. I've had some experiences in the past where I tied NICs to specific interfaces via changes in the ifcfg-ethX config files, but this is the first time I've ever seen an add-in card take over eth0 from the motherboard's interfaces. This impacted my kickstart scripts, so: 1). How can I ensure that the onboard NICs take precedence in the kickstart arrangement. 2). What is the most consistent way to maintain that ordering through repeated reboots, kernel changes (e.g. going from a RHEL mainline kernel to a RHEL MRG realtime kernel), etc. 3). What is the interaction between the /etc/modprobe.conf module/NIC definitions, the /etc/sysconfig/network-scripts/ifcfg-ethX and the /etc/modprobe.d/blacklist functions in this context?

Read the article

Backup Exec backup-to-disk folder creation - Access denied

- by ewwhite

I'm having a difficult time creating a backup-to-disk folder in Symantec Backup Exec 12.5 and Backup Exec 2010. The backend storage is a Nexenta/ZFS-based NAS filer sharing the volume via CIFS. I've also seen the issue on other *nix-based NAS devices. I've attempted mapping the drive, providing the full paths to the folder, etc. I can browse to the share just fine from within Windows, but Backup Exec fails to create the B2D folder with different variants of a Unable to create new backup folder. Access denied error. I've attempted creating service accounts in Backup Exec to handle the authentication, but nothing seems to work. What's the key to making this work?

Read the article

Customer site is out of IP addresses, they want to go from /24 to /12 netmask... Bad idea?

- by ewwhite

One of my client sites called to ask me to change the subnet masks of the Linux servers I manage there while they re-IP/change the netmask of their network based on a 10.0.0.x scheme. "Can you change the server netmasks from 255.255.255.0 to 255.240.0.0?" You mean, 255.255.240.0? "No, 255.240.0.0." Are you sure you need that many IP addresses? "Yeah, we never want to run out of IP addresses." A quick check against the Subnet Cheat Sheet shows: a 255.255.255.0 netmask, a /24 provides 256 hosts. It's clear to see that an organization can exhaust that number of IP addresses. a 255.240.0.0 netmask, a /12 provides 1,048,576 hosts. This is a small < 200-user site. I doubt that they'd allocate more than 400 IP addresses. I suggested something that provides fewer hosts, like a /22 or /21 (1024 and 2048 hosts, respectively), but was unable to give a specific reason against using the /12 subnet. Is there anything this customer should be concerned about? Are there any specific reasons they shouldn't use such an incredibly large mask in their environment?

Read the article

Consulting: Organizing site/environment documentation for customers?

- by ewwhite

Over time, I've taken on consulting and contract engineering work for various clients. More recently, customers are asking for certain types of documentation. These are small businesses and typically do not have dedicated technical staff. Within a single company, Wiki/Confluence/Sharepoint, etc. all make sense as a central repository for documentation and environment information. I struggle with finding a consistent method to deliver the following information to discrete customers. I'm shooting for a process that's more portable, secure and elegant than a simple spreadsheet or the dreaded binder full of outdated information. Important IP addresses, DHCP scope, etc. Network diagram (if needed). Administrative usernames and passwords and management URLs. Software license keys. Support contracts and warranty information. Vendor support contacts and instructions. I know there are other consultants here. Any suggestions or tips on maintaining documentation across multiple environments in a customer-friendly format? How do you do it?

Read the article

vSphere education - What are the downsides of configuring virtual machines with too much RAM?

- by ewwhite

VMware memory management seems to be a tricky balancing act. With cluster RAM, Resource Pools, VMware's management techniques (TPS, ballooning, host swapping), in-guest RAM utilization, swapping, reservations, shares and limits, there are a lot of variables. I'm in a situation where clients are using dedicated vSphere cluster resources. However, they are configuring the virtual machines as though they were on physical hardware. In turn, this means a standard VM build may have 4 vCPUs and 16GB or more of RAM. I come from the school of starting small (1 vCPU, minimal RAM), checking real-world use and adjusting up as necessary. Some examples from a "problem" cluster. Resource pool summary - Looks almost 4:1 overcommitted. Note the high amount of ballooned RAM. Resource allocation - The Worst Case Allocation column shows that these VMs would have access to less than 50% of their configured RAM under constrained conditions. The real-time memory utilization graph of the top VM in the listing above. 4 vCPU and 64GB RAM allocated. It averages under 9GB use. Summary of the same VM What are the downsides of overcommitting and overconfiguring resources (specifically RAM) in vSphere environments? Assuming that the VMs can run in less RAM, is it fair to say that there's overhead to configuring virtual machines with more RAM than they need? What is the counter-argument to: "if a VM has 16GB of RAM allocated, but only uses 4GB, what's the problem??"? E.g. do customers need to be educated? What specific metric should be used to meter RAM usage. Tracking the peaks of "Active" versus time?

Read the article

Suggestions for making sysfs parameters persist across reboots

- by ewwhite

I'm experimenting with large changes to Linux system runtime parameters exposed through the sysfs virtual file system. What is the most efficient way to maintain these parameters so that they persist across reboots on a RHEL/CentOS-style system? Is it simply dumping commands into /etc/rc.local? Is there an init script that's well-suited for this? I'm also thinking about standardization from a configuration management perspective. Is there a clean sysfs equivalent to sysctl?

Read the article

Experiences with eXdupe?

- by ewwhite

I noticed that the eXdupe compression/archiving/deduplication utility was recently mentioned in another post here. It boasts some interesting features, and I've been playing with it for the past day. It's basically a cross-platform, highly multithreaded archival tool. http://exdupe.com/index.html I'm curious if anyone here uses it in production or has any tips on how to leverage the tool in their environment. I'm looking for suggestions.

Read the article

Quick method to determine SSD drive health?

- by ewwhite

I have an Intel X-25M drive that was marked "failed" twice in a ZFS storage array, as noted here. However, after removing the drive, it seems to to mount, read and write in other computers (Mac, PC, USB enclosure, etc.) Is there a good way to determine the drive's present health? I feel that the previous failure in the ZFS solution was the convergence of bugs, bad error reporting and hardware. It seems like this drive may have some life in it, though.

Search Results

Search found 41 results on 2 pages for 'ewwhite'.

Page 2/2 | < Previous Page | 1 2

- by ewwhite

- by ewwhite

- by ewwhite

- by ewwhite

- by ewwhite

- by ewwhite

- by ewwhite

- by ewwhite

- by ewwhite

- by ewwhite

- by ewwhite

- by ewwhite

- by ewwhite

- by ewwhite

- by ewwhite

- by ewwhite

< Previous Page | 1 2