Search Results

Search found 9244 results on 370 pages for 'thinking sphinx'.

Page 339/370 | < Previous Page | 335 336 337 338 339 340 341 342 343 344 345 346 | Next Page >

Intermittently, IIS7 requests get stuck in WindowsAuthenticationModule

- by Richard Beier

We're running an IIS7 server hosting several dozen websites. Several of these websites are all part of the same legacy app we've developed. These sites all run the same code and run in the same app pool. Roughly once a month over the past few months, we've found that all requests for this app pool start hanging indefinitely. When this happens, we receive an alert and we recycle the app pool. After that, the sites start working again. This only ever affects this one app pool - never any others on the same server. A couple times, before recycling the pool, I've looked at the currently-executing requests in the worker process. They all show up as executing inside the WindowsAuthenticationModule. Which is strange, because the vast majority of the application does not require authentication. There is a small admin section which uses Windows auth... but all the other requests should be anonymous. Does anyone have any idea as to what might be causing this? There are several unusual things about the way these sites are set up. As I mentioned, they all run the same code - multiple sites point at the same physical directory. The only difference is the host header bindings. I'm not sure why there isn't just one site with all the host headers, but that's how it works. In several of these sites, the same physical directory is mapped at two levels - as the root of the site and again as an application within the site. So if a user goes to http://oursite.com/index.aspx, that maps to c:\files\oursite\index.aspx. If a user goes to http://oursite.com/foo/index.aspx, that also maps to c:\files\oursite\index.aspx. I think there is code which looks at the request URL and handles the two requests differently. This is strange because the same web.config ends up being interpreted as a site config file, and also as an application config file within the site. I don't know if this might be related to the authentication problem. If we can't find the cause, we're thinking of a few workarounds we could try: Move the admin section into a separate site, and give the client a new admin URL. Run that separate site in its own app pool. Then in the web.config shared by all the other sites, remove the WindowsAuthenticationModule. That way there should be no possibility of a hang within the WindowsAuthenticationModule. Try running all these sites in the classic pipeline instead of the integrated pipeline. They were working fine on our old IIS6 server... (If we get desperate) Set up a watchdog script which monitors the sites and auto-recycles the app pool when it detects that requests are getting stuck. What do you think? Thanks for your help, Richard

Read the article
change owner/uid of mount point upon mount

- by Shiplu

The scenario is like this. Bob has a computer. It crashed. Now he only has the hdd. The hdd is in ext3 format. He go to his office and told the sys admin John to mount this hdd and put the mount point in his home directory. John used the following fstab entries. # Bobs harddisk /media/TAPE4/Bobs-hdd.img /home/bob/myhdd/windows ntfs ro,loop,offset=32256 0 0 /media/TAPE4/Bobs-hdd.img /home/bob/myhdd/linux ext3 ro,loop,offset=14048810496 0 0 /media/TAPE4/Bobs-hdd.img /home/bob/myhdd/extra ntfs ro,loop,offset=28015335936 0 0 Bob was happy. He could access his old extra and windows. Specially the Documents and Settings in windows was helpful for him. But he found a problem. He is a web developer and all his websites are in linux/home/bob/public_html directory. When he tried to access that public_html directory he got permission_denied. He executed ls -lh he saw this. drwxr-xr-x 2 john john 4.0K Nov 9 2011 Desktop drwxr-xr-x 3 john john 4.0K Aug 12 2011 Documents drwxr-xr-x 3 john john 4.0K Aug 21 2011 public_html He contacted John thinking he might be mistakenly did this. But John couldn't find a way why this happend? Then one thing came into his mind file system hardly store username. They store uids. So he executed ls -ln drwxr-xr-x 2 1000 1000 4096 Nov 9 2011 Desktop drwxr-xr-x 3 1000 1000 4096 Aug 12 2011 Documents drwxr-xr-x 3 1000 1000 4096 Aug 21 2011 public_html John thinks 1000 is the first uid on a linux system. As he is the admin of the current system. He created his account first. so Johns uid was 1000. Bob also setup his private system and crated his account first. So Bobs uid was 1000 too. So thats an expected behavior. But problem remains. How can Bob access those websites in public_html?

Read the article
Computer experiencing slowdowns and lockups despite low cpu useage

- by user157145

my setup i5-2300 nvidia gtx 550 ti 6 gigs ram 600 w ocz modular psu recently reformatted and already experiencing drastic slowdown as soon as windows comes up, including repeated lockups with multiple various programs reporting that they are not responsive, then recovering after 10-30 seconds. ive checked memory and hard drive both of which come out fine. despite my plethura of worthless antiviral software im forced to assume that my illicit downloading practices have lead me into some comp trouble that i cant seem to determine. i have used ccleaner, search and destroy and malware bytes, all of which have found nothing to indicate what is causing this massive slowdown. in addition according to my resource manager my computer is operating at a load of only 30-50 percent CPU useage and 60 ram useage but taking 5-10 seconds to load files and open folders, and repeated lockups of multiple programs, especially firefox which seems to go unresponsive every 2-3 minutes. any help would be appreciated, i used a program called OTL by old timer, but cant make any sense of the results i was given. any help or suggestions would be appreciated, thank you for taking the time to read this i have avast but it didnt even find anything when i had it do a full system scan, so im thinking its clueless(also nortons, avg, and ad-aware). i also have mse but it has yet to complete a full scan it takes so long (i left it on last night but when i woke up my computer had a problem and had to restart). my hard drive has 300 gigs out of 1tb open and i already used hd tune pro, which said my harddrive was fine and its not a ssd. also im a noob at comps and only have the hd that is currently inside the computer in addition im not sure if studdering is the issue im suffering. my problem is that during my typing of these responses firefox has gone "not responsive" at least 5 times, each for times of about 5-10 seconds. when i try to control alt delete to bring up windows task manager it took 20 seconds. essentially its that my computer goes super slow at bringing up anything, or taking any action whatsoever that opens a program or file and has repeated incidents where i cant even click on whatever im trying to do because it locks up. the confusing thing about these incidents is that its right after restarting where there are minimal programs running and the computer and memory load is light.

Read the article
Encoding multiple video streams with a single avconv invocation

- by automatthias

I played with avconv on Ubuntu and I'm now able to e.g. record the desktop with sound from a soundcard. One thing I wanted to do was recording two video inputs at the same time, for instance the desktop and from the webcam. I thought about doing something like this: avconv \ -f alsa \ -i default \ -acodec flac \ -f video4linux2 \ -r 6 \ -i /dev/video0 \ -f x11grab \ -i :0.0 \ out.mkv My thinking was that if you define multiple video inputs, and the .mkv format can handle multiple video streams, avconv will encode 2 video streams and 1 audio stream into one file. But this isn't what happens: avconv version 0.8.4-6:0.8.4-0ubuntu0.12.10.1, Copyright (c) 2000-2012 the Libav developers built on Nov 6 2012 16:51:11 with gcc 4.7.2 [alsa @ 0x1091bc0] capture with some ALSA plugins, especially dsnoop, may hang. [alsa @ 0x1091bc0] Estimating duration from bitrate, this may be inaccurate Input #0, alsa, from 'default': Duration: N/A, start: 1354364317.020350, bitrate: N/A Stream #0.0: Audio: pcm_s16le, 48000 Hz, 2 channels, s16, 1536 kb/s [video4linux2 @ 0x10923e0] Estimating duration from bitrate, this may be inaccurate Input #1, video4linux2, from '/dev/video0': Duration: N/A, start: 100607.724745, bitrate: 29491 kb/s Stream #1.0: Video: rawvideo, yuyv422, 640x480, 29491 kb/s, 6 tbr, 1000k tbn, 6 tbc [x11grab @ 0x107b2a0] device: :0.0+83,87 -> display: :0.0 x: 83 y: 87 width: 854 height: 480 [x11grab @ 0x107b2a0] shared memory extension found [x11grab @ 0x107b2a0] Estimating duration from bitrate, this may be inaccurate Input #2, x11grab, from ':0.0+83,87': Duration: N/A, start: 1354364318.488382, bitrate: 196761 kb/s Stream #2.0: Video: rawvideo, bgra, 854x480, 196761 kb/s, 15 tbr, 1000k tbn, 15 tbc Incompatible pixel format 'bgra' for codec 'mpeg4', auto-selecting format 'yuv420p' [buffer @ 0x107fcc0] w:854 h:480 pixfmt:bgra [avsink @ 0x10bdf00] auto-inserting filter 'auto-inserted scaler 0' between the filter 'src' and the filter 'out' [scale @ 0x10dc680] w:854 h:480 fmt:bgra -> w:854 h:480 fmt:yuv420p flags:0x4 Output #0, matroska, to '.../out.mkv': Metadata: encoder : Lavf53.21.0 Stream #0.0: Video: mpeg4, yuv420p, 854x480, q=2-31, 4000 kb/s, 1k tbn, 15 tbc Stream #0.1: Audio: libvorbis, 48000 Hz, 2 channels, s16 Stream mapping: Stream #2:0 -> #0:0 (rawvideo -> mpeg4) Stream #0:0 -> #0:1 (pcm_s16le -> libvorbis) Press ctrl-c to stop encoding [mpeg4 @ 0x10bd800] rc buffer underflow ^Cframe= 160 fps= 15 q=2.0 Lsize= 3414kB time=10.66 bitrate=2623.0kbits/s video:3273kB audio:131kB global headers:4kB muxing overhead 0.165600% Received signal 2: terminating. I'm not sure if it's the question of mapping (some -map options to add?) or that avconv just can't encode more than 1 video stream at one time. So is it an actual avconv limitation, or a limitation of the available containers, or me simply not finding the right combination of command line options?

Read the article
Subversion/Hudson/Sonar/Artifactory - too much for my little server to handle! Help!

- by Ricket

I have a little dedicated server. It's at a cheap price and has a simple AMD 1800+ (1.5ghz), 256mb DDR RAM, ...need I continue? And I think I'm overloading it already. I have installed the following, and it's running CentOS 5.4: Webmin Apache MySQL Subversion as an Apache module Hudson (standalone) Sonar (standalone, runs with a standalone Jetty install) Artifactory (standalone) That's pretty much it. But I'm having problems; pages are loading quite slowly. Network speed of the server is excellent, but I think I'm just running out of CPU and/or memory. A side-effect of the pages loading slowly is that sometimes Hudson times out, not being able to start Maven or contact Sonar in a certain amount of time. I think the next step to speed things up might be to move to an application server and use the WAR version of Hudson, Sonar and Artifactory together on that server. I don't know that it will help, but it just seems to make sense, especially with Sonar running on its own Jetty install and the other two probably running their own mini application servers as well. Am I correct in thinking this? Is this the right course of action? Any other tips on how to make the server run faster? I can post more data if you'd like, just let me know what else would help you answer my question. Oh, also just to cure any suspicions, I don't have any sort of virus or spyware. I protect my SSH access with DenyHosts (which has blocked 300+ brute forcers in the past few months), and I have confirmed that the top four processes in terms of memory and CPU usage are Sonar, Artifactory, Hudson, and MySQL. Edit: I just thought of another thing that I'd like you to comment on as well: Apache currently has 8 spawned slave processes, taking 42MB of ram apiece. This is not my web server. Is everything else able to function if I shut down Apache? Can you point me towards a tutorial or something on migrating Subversion from Apache into something that might work along with the other three applications, maybe even make Subversion a WAR file or something?

Read the article
Best RAID setup for multimedia fileserver?

- by Mr. Schwabe

I'm building a fileserver for my small office. We do film and multimedia design. Only 3 clients connected. The server is primarily for local access to graphic assets and video files. I'm looking for advice on hardware and software required. Particularly for the RAID. I have the following objectives: A) merged capacity I'd like all other systems to access the data as a single mapped network drive that has an initial capacity of 10 TB. So perhaps 5x 2TB drives (plus mirror drives for redundancy). B) easy way to increase capacity Thinking long term, I'd like to 'easily' add more drives to the array for a potential two or three fold increase in capacity. So theoretically it could get upto a 30 TB raid array consisting of maybe 15x 2 TB drives of capacity (plus mirror drives for redundancy). C) maximum fault tolerance I want at least 1 mirror drive per capacity drive (in laymen's terms). So if I start with 10 TB / 5x 2TB of capacity, I suppose I would need another another 5x 2TB drives to be mirrors. So 10 drives total. But I'd also like potential for even more redundancy; with upto 2 additional mirrors per 'capacity drive' (and to be able to add them to the array anytime with ease). D) easy way to monitor drive health I'd like an intuitive interface for managing the raid and monitoring drive health The other systems accessing this network drive will be running Windows, but also the odd Ubuntu and MacOS system as well. Are these objectives attainable? What type of RAID setup do you recommend? What hardware will be required? Also what OS do you think this system should be running? Does it really matter? I'm no network admin - just a long time Windoze user, without much Linux experience. That said, I'm not opposed to a Linux solution if it's easy enough and more practical than a Windows OS for this server. Or maybe something such as Openfiler. Budget should hit the sweet spot for value and performance (hence my preference to use 2TB drives). The biggest focus is storage; aside from that the system just needs to keep the drives running optimally with perhaps 2 or 3 clients accessing / writing files at any given time. The hardware quote would start with something like 10x 2TB WD Caviar Blacks; about $1900 for the storage + $x for remaining parts. http://ncix.com/products/index.php?sku=42775&vpn=WD2001FASS&manufacture=Western%20Digital%20WD Your advice is appreciated, thanks!

Read the article
Automatic layout of manual network mapping

- by Paul

So I have a small business network mainly consisting of two routed layer-2 domains with a total of ca. 100 devices spread over ca. 2000m² production and office spaces. Typical problems to solve using the graph would be: Over what (cable) path is a PC connected to the server? Where to expect devices connected to a switch port? I want to generate a graph of the physical network topology: Nodes are endpoint devices, switch ports, wall outlets, patch panel ports etc. Edges are cable connections. Ideally, grouping edges (or segments) that pass through the same bundle could be grouped. Also I would like to augment the graph data with automatically gathered data (monitoring state, MAC address, Switch port <- MAC entries to build up parts of the map). At the moment I use graphviz for this inside a Confluence wiki like that: layout = "neato" overlap = scale subgraph { rankdir = "TB" subgraph cluster_r1pf1 { r1pf1 [label="{ Rack 1 PF 1 | { <p1>P1 | <p2>P2 | <p3>P3} }", shape=record] } subgraph cluster_switch1 { switch1 [label="{ Rack 1 Switch 1 | { <p1> P1 | <p1> P1 | <p3> P3} }", shape=record] } r1pf1:p1 -> switch1:p1 (obviously there are dozens of entries omitted here) Problem is: I have a hard time to influence graphviz to generate a bearable layout. Edges overlap so bad that you can't read the diagram anymore. The question is: What other tools (be it interactive like Visio, Omnigraffle or I/O-oriented like graphviz) exist that would allow an easily versionable (as in: Operates on a text file) documentation that is both machine and human readable and editable? Why not OmniGraffle or Visio? Well we don't have Macs and Visio is not available at the moment. To buy it I would need good arguments. Automation would be one of that. But last time I looked, versioning Visio files or even thinking about automatic handling was a nightmare. Related: Network Mapping Tools basically asks the same with a focus on generating the complete graph automatically (but without the need to document cabling connections) Recommendations for automatic computer inventory brings up links of "all-in-one" solutions

Read the article
Windows DHCP Server - get notification when a non-AD joined device gets an IP address

- by TheCleaner

SCENARIO To simplify this down to it's easiest example: I have a Windows 2008 R2 standard DC with the DHCP server role. It hands out IPs via various IPv4 scopes, no problem there. WHAT I'D LIKE I would like a way to create a notification/eventlog entry/similar whenever a device gets a DHCP address lease and that device IS NOT a domain joined computer in Active Directory. It doesn't matter to me whether it is custom Powershell, etc. Bottom line = I'd like a way to know when non-domain devices are on the network without using 802.1X at the moment. I know this won't account for static IP devices. I do have monitoring software that will scan the network and find devices, but it isn't quite this granular in detail. RESEARCH DONE/OPTIONS CONSIDERED I don't see any such possibilities with the built in logging. Yes, I'm aware of 802.1X and have the ability to implement it long-term at this location but we are some time away from a project like that, and while that would solve network authentication issues, this is still helpful to me outside of 802.1X goals. I've looked around for some script bits, etc. that might prove useful but the things I'm finding lead me to believe that my google-fu is failing me at the moment. I believe the below logic is sound (assuming there isn't some existing solution): Device receives DHCP address Event log entry is recorded (event ID 10 in the DHCP audit log should work (since a new lease is what I'd be most interested in, not renewals): http://technet.microsoft.com/en-us/library/dd759178.aspx) At this point a script of some kind would probably have to take over for the remaining "STEPS" below. Somehow query this DHCP log for these event ID 10's (I would love push, but I'm guessing pull is the only recourse here) Parse the query for the name of the device being assigned the new lease Query AD for the device's name IF not found in AD, send a notification email If anyone has any ideas on how to properly do this, I'd really appreciate it. I'm not looking for a "gimme the codez" but would love to know if there are alternatives to the above list or if I'm not thinking clear and another method exists for gathering this information. If you have code snippets/PS commands you'd like to share to help accomplish this, all the better.

Read the article
Missing startup screens and slow bootup/login after using WinClone to expand Bootcamp partition

- by user26453

I used WinClone to backup my Bootcamp partition, which was a Windows 7 Ultimate install, on my late 2006 Macbook Pro. I desired to expand the Bootcamp partition's size. It worked reasonably well with some hiccups along the way and some remaining issues. First issue I ran into was the Bootcamp Assistant utility - it would not recreate the partition. This was due to a lack of contiguous space that is required for the Bootcamp partition. As a result I wiped the whole drive and reinstalled Snow Leopard, did the minimum amount of system updates, and created and formated a new Bootcamp partition. WinClone restored the image without complaint and the image was automatically resized to the new partition's size. Second issue I ran into was after the first boot into Windows. The first thing I noticed was that instead of the newer "slick" startup screen (4 colors wisping around, a Windows 7 title), there was more of an old school style startup screen (a progress bar with block increments, yellow/greenish color, nothing else really). The initial bootup to a login screen was slow, perhaps as Windows dealt with the partition changes. After logging in, the screen goes blank and the computer seems to hang for a minute, before completing the login. After subsequent restarts, the slick screen is still missing, boot to login screen is normal, but the time from login to desktop active is still very slow. As a side note, this behavior of a long time from login to the desktop finally loaded I've previously only seen when the computer would try to hibernate and fail (battery is really bad). On the next startup, I would see this behavior, but not subsequently. So a potential cause: I imaged the partition after hibernating out of Windows. From reading some posts/guides on the subject, this was not recommended, and perhaps shouldn't even have worked? Could the partition be stuck in some weird mode as a result that makes the boot issues appear? I've attempted to disable hibernation and restart, trying to delete the .sys file that hibernation uses. Other fixes I'm thinking of attempting are booting a Win7 disc and repairing the install/partition. I can't shake the nagging feeling something isn't right as a result of the modified boot screens and the slow login process.

Read the article
Real-time offline folder-to-folder backup application needed (Windows)

- by niktech

I recently started using Intel Matrix Storage RAID solution that allowed me to use my 5 1TB drives for two RAID volumes. First one a 1TB RAID 0 striped across all 5 drives and second one a RAID 5 across the rest of the free space on all drives (around 2.85TB usable space). The RAID 0 I use for OS, applications and games while the RAID 5 I use as a more-permanent type storage (photos, etc). Now I do realize that running the OS and applications on RAID 0 across 5 drives is very dangerous, which is what brings up the following question. Is there a reliable freeware realtime backup application that can backup a set of folders from one drive to another drive (no online backups needed)? I've already tried a few (Mozy, Yadis, Comodo Backup, GFI Backup, Idoo, Crash Plan) but none meet my requirements: Low CPU and RAM usage. Realtime Backups - as soon as a file is modified in the source folder, it is added to the backup queue which will be processed with the lowest priority when the CPU is idle. This backup queue should persist in cases of computer restarts (ie: the source and destination folders should always have the same set of files, except for the ones waiting in the backup queue). Incremental Backups - if only 10 bytes changed in a 1GB file, the app should only copy those 10 new bytes. Ability to back up locked and opened files (some apps, like Yadis, can't back up critical files like browser favorites). Ability to run as a service (no need for any user to log-in to have the app started). Optional requirements: Compression of the destination into a well-known format (RAR, Zip) that can be directly read without the use of the application. Preset source folders (such as Browser Favorites, Game Saves, Application Settings, etc). The idea is to use RAID 0 array as "semi-persistent RAM-like" storage which in case of a failure can be quickly rebuilt by reinstalling the OS, apps and games and copying over the settings, saves, favorites from the RAID 5. I'm also thinking of taking this RAID 0 as RAM idea to the extreme with SSDs (as soon as we get some nice 6Gb/s SATA III SSDs out there), where a couple of SSDs chained in RAID 0 will work as yet another semi-persistent cache layer sitting between the RAM and the HD. I'm just hoping there already exists an application that satisfies these requirements... otherwise I'll have to write one myself, which I would prefer not to do.

Read the article
What's the best way to do user profile/folder redirect/home directory archiving?

- by tpederson

My company is in dire need of a redesign around how we handle user account administration. I've been tasked with automating the process. The end goal is to have the whole works triggered by the business, and IT only looking in when there's an error reported. The interim phase is going to be semi-manual. That is a level 2 tech inputs the user's info and supervises the process. The current hurdle I'm facing is user profile archiving. Our security team requires us to archive the profile directories for any terminated user for 60 days in case the legal team requires access to their files. Our AD is as much a mess as everything else, so there are some users with home directories and some with profiles. Anyone who has a profile dir in AD also has a good deal of their profile redirected to our file servers over DFS. In order to complete the process manually you find the user in AD, disable them, find their home/profile dir, go there and take ownership, create an archive folder, move all their files over, then delete the old dir. Some users have many many gigs of nonsense and this can take quite some time. Even automated the process would not be a quick one. I'm thinking that I need to have a client side C# GUI for the quick stuff and some server side batch script or console app to offload this long running process. I have a batch script that works decently using takeown and robocopy, but I wonder if a C# console app would do a better job. So, my question at long last is, what do you think is the best way to handle this? I can't imagine this is a unique problem, how do other admins get this done? The last place I worked was easily 10x larger than the place I'm in now. If we would have been doing this manual crap there, they'd have needed a team of at least 30 full time workers to keep up. I have decent skills in C#.net and batch scripting, but am a quick study and I have used most every language once or twice. Thank you for reading this and I look forward to seeing what imaginative solutions you all can come up with.

Read the article
Deploying website content via Subversion

- by Johann

we have recently set up a new development infrastructure and process for one of our clients. This involves the strict use of subversion as a central source code repository. The svn repositories contains a seperate branch for code on the live system (/branches/live/). The repositories are use for PHP content (mainly Wordpress Blogs), but in future they may hold other asp code as well. Bonus points for a solutions which more or less in the same way with ASP code on Windows Server 2008 R2. We have two servers: one staging system and one live system. The staging system is updated regularly with the code of the trunk. The live system is update manually. Each webroot on the servers are working copy of either the trunk (staging system) or the live branch (live system). The current workflow is: Developing on the dev's box - commit into the trunk - auto-deploy on staging system - testing on the staging system - merging into /branches/live/ - manual deployment on live system. This works for one-way changes very well, however we have some troubles on every wordpress (or plugin) update: The WP update process removes the directories and unpack the archive of the new version. This removes the svn admin area as well, which produces a lot of errors. We could switch to SVN 1.7 with a single, global admin area, but this would only solve on part of the problem. Finally, we have done the update via the WP Gui, restored the svn admin area, added/removed the files and committed the changes to the trunk. After testing, we had to do basically the same thing on the live server (except the commit, we just reverted the changes and merged the new files from the staging system to the live system). I'm currently thinking of the following: The htdocs of each website is a svn export Each website has a svn working copy beside the htdocs directory a script which "replays" the changes in the wc from htdocs after an update in WP (rsync'ing the changed files to the working copy, rsync'ing new files and svn add them and finally svn delete the deleted files). The script would have to exclude some files (like wp-config.php, uploads/temp directories, etc.). Are there better ways to do this? Unfortunaly, a complete CI server is out of scope due to time and budget limitations.

Read the article
serving mp3s to mobile devices is flooding nginx with partial requests

- by drumfire

I am serving mp3s with a minimalistic nginx server. What I see in my log files is that there are a lot of requests, in particular from AppleCoreMedia and sometimes Android useragents, that flood the server with short requests. Sometimes they keep requesting to download the same partial content for a very long time; sometimes more than an hour. For example: "GET /somefile.mp3 HTTP/1.1" 206 33041 "AppleCoreMedia/1.0.0.9B206 (iPhone; U; CPU OS 5_1_1 like Mac OS X; en_us)" "GET /somefile.mp3 HTTP/1.1" 206 33041 "AppleCoreMedia/1.0.0.9B206 (iPhone; U; CPU OS 5_1_1 like Mac OS X; en_us)" "GET /somefile.mp3 HTTP/1.1" 206 33041 "AppleCoreMedia/1.0.0.9B206 (iPhone; U; CPU OS 5_1_1 like Mac OS X; en_us)" [...] I also get a lot, but not as much, of these: "-" 400 0 "-" "-" 400 0 "-" The IP addresses are always from clients that start downloading shortly after that request, usually they have roughly the same UserAgent as in the first example. emphasized text I have enabled server throttling and connection limits in nginx to limit the huge amount of log entries from equivalent IPs at least somewhat. There was a performance issue when I saw the same behaviour on the previous server that used Apache. I installed nginx on a better server then moved the site. When Apache could not handle more connections from the increasing number of clients effectively that server was ddossed. There was no bandwidth issue with already connected clients and I don't know if the already connected clients were using more than one connection at a time. Please tell me: Are clients that appear to get stuck on a download a Bad Thing™ I heard people say their mobile bandwidth use was much higher than they could account for. I'm thinking this type of client behaviour can account for that. And costs us more bandwidth too. Which up to date alternatives exist out there that can handle serving this type of data better than plain HTTP? Useful general insights for someone who just came into this field straight out of the late 90s. :-)

Read the article
Nginx giving a lot of 502 errors

- by Loki

Since a while I have installed nginx and everything seemed to be working fine, recently I found out that about 20% of the time users are getting 502-errors. This is also noticable when Google tries to crawl my site in Webmaster Tools (from 10000 posts, approx. 2000 502 errors) At first I was thinking to disable nginx, but I'd really like to keep using it. I'm running it on a server with 2GB RAM and 4 Reserved CPU Cores. WHM/cPanel installed and Mod_Ruid2 enabled + DSO as a PHP Handler with APC caching installed. Is there anything I can change in the config, that can fix this? I have installed Nginx Admin in WHM and here is what's in the configuration editor screen: user nobody; worker_processes 4; error_log /var/log/nginx/error.log info; worker_rlimit_nofile 20480; events { worker_connections 10240; # increase for busier servers use epoll; # you should use epoll here for Linux kernels 2.6.x } http { server_name_in_redirect off; server_names_hash_max_size 10240; server_names_hash_bucket_size 1024; include mime.types; default_type application/octet-stream; server_tokens off; disable_symlinks if_not_owner; sendfile on; tcp_nopush on; tcp_nodelay on; keepalive_timeout 5; gzip on; gzip_vary on; gzip_disable "MSIE [1-6]\."; gzip_proxied any; gzip_http_version 1.1; gzip_min_length 1000; gzip_comp_level 6; gzip_buffers 16 8k; text/plain text/xml text/css application/x-javascript application/xml image/png image/x-icon image/gif image/jpeg application/xml+rss text/javascript application/atom+xml; ignore_invalid_headers on; client_header_timeout 3m; client_body_timeout 3m; send_timeout 3m; reset_timedout_connection on; connection_pool_size 256; client_header_buffer_size 256k; large_client_header_buffers 4 256k; client_max_body_size 200M; client_body_buffer_size 128k; request_pool_size 32k; output_buffers 4 32k; postpone_output 1460; proxy_temp_path /tmp/nginx_proxy/; client_body_in_file_only on; log_format bytes_log "$msec $bytes_sent ."; include "/etc/nginx/vhosts/*"; } I hope someone can help me out. Thanks in advance!

Read the article
Should use EXT4 or XFS to be able to 'sync'/backup to S3?

- by Rafa

It's my first message here, so bear with me... (I have already checked quite a few of the "Related Questions" suggested by the editor) Here's the setup, a brand new dedicated server (8GB RAM, some 140+ GB disk, Raid 1 via HW controller, 15000 RPM) it's a production web server (with MySQL in it, too, not just serving web requests); not a personal desktop computer or similar. Ubuntu Server 64bit 10.04 LTS We have an Amazon EC2+EBS setup with the EBS volume formatted as XFS for easily taking snapshots to S3, via AWS' console. We are now migrating to the dedicated server and I want to be able to backup our data to Amazon's S3. The main reason being the possibility of using the latest snapshot from an EC2 instance in case of hardware failure on the dedicated server. There are two approaches I am thinking of: do a "simple" file-based backup with rsync, dumping the database' and other files, and uploading to amazon via S3 API commands, or to an EC2 instance, or something. do a file-system "freeze" (using XFS) with the usual ebs/ec2 snapshot tool to take part of the file system, take a snapshot, and upload it to Amazon. Here's my question (or series of questions): Can I safely use XFS for the whole system as the main and only format on the dedicated server? If not, is it safe to use EXT4? Or should I use something else? would then be possible to make snapshots of the system to upload to Amazon? Is it possible/feasible/practical to do what I want to do, anyway? any recommendations? When searching around for S3/EBS/XFS, anything relevant to my problem is usually focused on taking snapshots of a XFS system that is already an EBS volume. My intention is to do it in a "real"/metal dedicated server. Update: I just saw this on Wikipedia: XFS does not provide direct support for snapshots, as it expects the snapshot process to be implemented by the volume manager. I had always assumed that I could choose 2 ways of doing snapshots: via LVM or via XFS (without LVM). After reading this, I realize these 2 options are more like it: With XFS: 1) do xfs_freeze; 2) copy the frozen files via, eg, rsync; 3) unfreeze xfs With LVM and XFS: 1) do xfs_freeze; 2) make a binary copy of the frozen fs via lvcreate and related commands; 3) unfreeze xfs; 4) somehow backup the LVM snapshot. Thanks a lot in advance, Let me know if I need to clarify something.

Read the article
Monitoring the status of accounts with IT Service providers (ISP, Domain Registrar etc.)

- by Sholom

Hi All, Short version: You have software that tells you when your servers power-outlet is down. It monitors multiple servers from one management console, alerts you when something is wrong etc. Does anyone know of software that will let me take the same approach to monitor if the money-outlet (the bill!) is down (not paid) to my IT Services providers (ISP, Domain Registrar, MX Backup service etc). I need a top down, centrally managed service that is capable of sending out alerts. Just like the one that monitors my own exchange server etc. I don't mind if i have to manually enter every payment. Long version: Our very likable but absent minded bookkeeper keeps neglecting to pay our IT vendors on time. Just this past week our internet service was disconnected. Same could happen to many other mission critical accounts (domain registrar, backup MX, anti-virus license, HackerSafe (McAfee secure) service and even an 800 number to name a few). As the sysadmin, i monitor my severs to make sure they are plugged into the power-outlet. I believe i should also monitor my services to make sure they are plugged in to their money-outlet. To compound the problem, when the power goes out someone else will likely notice and notify me. But if a bill is not payed, no one will ever notice until service is lost. Lost as in losing our domain name which would cause a lot more damage then the power failing on our server. [Solution] = [Doesn't work because]: Retrain the bookkeeper = Wishful thinking. Notify my manager = Already have (via email). Protects me, does not solve problem. Fire bookkeeper = What makes you so sure the next one will never forget? Bottom line: Humans are humans and sooner or later something critical will be royally messed up. We need to partner with a machine to help us out here. Anybody have the same problem? What software/solution do you use? I would like software that emails me when a bill is passed due just like i get an email when the power outlet fails. Anyone hear of anything like that? Thanks

Read the article
Operative systems on SD cards

- by HisDudeness

I was getting some wild ideas the last days, like putting some operative systems into SD cards rather than on my hard drive. I'll go further into details now and explain what lead me to consider this probably abominable decision. I am on a laptop (that means I have a native SD-card reader) which is currently running a cross-distro setup, with a bunch of Linux systems (placed in dedicated ext4 logical partitions into a huge extended one) regulated by an unique GRUB. Since today, my laptop haven't even seen any Windows system with binoculars. I was thinking about placing all the os part of my setup into a Secure Digital to save all my 500 Gb Hard Drive for documents, music, videos and so on, and being able to just remove the SD and boot my system into another computer too, as well as having the possibility of booting other systems into mine by just plugging in another SD, without having to keep it constantly placed in my PC. Also, in the remote case in the near future I just wanted to boot Windows 8 in it, I read it causes major boot incompatibility issues with other systems by needing a digital signature in order for them to start. By having it in a removable drive, I could just get rid of it when I'm needing him and switch its card with Linux one, and so not having any obstacles to their boot. Now, my questions are: I know unlikely traditional rotating disk drives, integrated circuits ones have a limited lifespan in terms of cluster rewriting. Is it an obstacle to that kind of usage? I mean, some Ultrabooks are using SSD now, is it the same issue, or there are some differences between Solid State Drives and Secure Digitals in that sense? Maybe having them to store system files which are in fixed positions (making the even-usage of cluster technology useless) constantly being re-read and updated and similar things just gets them soon unserviceable, do it? Second question: are all motherboards and BIOSes able to boot from SDs just like they are from USB pen drives (I mean, provided card reader is USB-connected, isn't it)? Or can't bootloaders like GRUB be installed on SDs working? If they can't, is it a solution installing GRUB to MBR and making boot option pointing to SD? Will it work? Are there any other problems to installing OSs on a Secure Digital?

Read the article
Suggestions for splitting server roles amongst Hyper-V virtual servers / RAID6 or RAID10? / AppAssure

- by Anon

We have 2 Hyper-V hosts at present running 1 virtual server that was converted from a physical box running all roles. My plan is to split the roles over various virtual machines, upgrading to the latest software versions as I go, and use the backup server as a standby in case the main server fails. AppAssure backup software has a feature called Virtual Standby, so the VHD's can be ready to be fired up on the backup server if necessary. Off-site backups will be done via external USB drive for now. I'm just seeking some input/suggestions into how I'm planning to split the roles out amongst various virtual servers. Also, I'm curious how to setup the storage on the servers. We do not have any NAS's, SAN'S or any budget for this. What would the best RAID level be to use? I'm thinking either RAID6 (which is currently used) however I'm concerned about the write speeds, or RAID10 but again I'm worried that I can only lose 1 drive (from the same mirror) as opposed to any 2 with RAID6. I realise I have a hot swap for this, but what if a further drive fails during a rebuild? Is the write penalty of RAID6 worth the extra reliability over RAID10? Or will it be too slow with all the roles I am planning, therefore RAID10 is my only real option? The reason for the needed redundancy is I am the only technician and I'm not always on-site. Options I've considered: 1) 5 drives in RAID6 set, 200gb for host OS, rest for VM storage. 1 drive for hot swap - this is how it is currently setup 2) 4 drives in RAID10 set, 200gb for host OS, rest for VM storage. 2 drives for hot swap 3) 4 drives in RAID10 set for VM storage, 2 drives in RAID1 set for host OS. No drives for hot swap - While this is probably the best option with the amount of drives I have, I don't like the idea of having no hot swap 4) 3 drives in RAID6 set for VM storage, 2 drives in RAID1 set for host OS. 1 drive for hot swap All options give us enough storage capacity for our files, etc. We don't have any budget for extra drives or extra hot swap HD chassis for the servers. We have about 70 clients and about 150 users. MAIN SERVER Intel Xeon 5520 @ 2.27 GHz (2 processors) 16GB RAM 6 x 1TB Seagate Barracuda ES.2 Enterprise SATA drives Intel SRCSATAWB RAID controller Virtual machine workload using Hyper-V on Windows Server 2008 R2: DC01 - Active Directory Domain Controller / DNS server / Global catalog - 1GB RAM DC02 - Active Directory Domain Controller / DNS server / Global catalog - 1GB RAM Member Server - DHCP server, File server, Print server - 1GB RAM SCCM Member Server - 4GB RAM Third Party Software Member Server - A/V server, Ticketing software, etc - 4GB RAM Exchange 2007 - 4GB RAM - however we are probably migrating to a hosted solution, therefore freeing up resources BACKUP SERVER Intel Xeon E5410 @ 2.33GHz (2 processors) 16GB RAM 6 x 2TB WD RE4 SATA drives Intel SRCSASRB RAID controller Virtual machine workload using Hyper-V on Windows Server 2008 R2: AppAssure backup software - 8GB RAM

Read the article
Monitoring tools that can take high rate and high volume?

- by Jon Watte

We're using Cacti with RRDTool to monitor and graph about 100,000 counters spread across about 1,000 Linux-based nodes. However, our current setup generally only gives us 5-minute graphs (with some data being minute-based); we often make changes where seeing feedback in "near real time" would be of value. I'd like approximately a week of 5- or 10-second data, a year of 1-minute data, and 5 years of 10-minute data. I have SSD disks and a dual-hexa-core server to spare. I tried setting up a Graphite/carbon/whisper server, and had about 15 nodes pipe to it, but it only has "average" for the retention function when promoting to older buckets. This is almost useless -- I'd like min, max, average, standard deviation, and perhaps "total sum" and "number of samples" or perhaps "95th percentile" available. The developer claims there's a new back-end "in beta" that allows you to write your own function, but this appears to still only do 1:1 retention (when saving older data, you really want the statistics calculated into many streams from a single input. Also, "in beta" seems a little risky for this installation. If I'm wrong about this assumption, I'd be happy to be shown my error! I've heard Zabbix recommended, but it puts data into MySQL or some other SQL database. 100,000 counters on a 5 second interval means 20,000 tps, and while I have an SSD, I don't have an 8-way RAID-6 with battery backup cache, which I think I'd need for that to work out :-) Again, if that's actually something that's not a problem, I'd be happy to be shown the error of my ways. Also, can Zabbix do the single data stream - promote with statistics thing? Finally, Munin claims to have a new 2.0 coming out "in beta" right now, and it boasts custom retention plans. However, again, it's that "in beta" part -- has anyone used that for real, and at scale? How did it perform, if so? I'm almost thinking about using a graphing front-end (such as Graphite) and rolling my own retention backend with a simple layer on top of mmap() and some stats. That wouldn't be particularly hard, and would probably perform very well, letting the kernel figure out the balance between frequency of flushing to disk and process operations. Any other suggestions I should look into? Note: it has to have shown itself able to sustain the kinds of data loads I'm suggesting above; if you can point at the specific implementation you're referencing, so much the better!

Read the article
How to set up a file server in a restricted corporate environment

- by Emilio M Bumachar

I work in a big corporation, and the disk space my team gets in the corporate file server is so low, I am considering turning my work PC into a file server. I ask this community for links to tutorials, software suggestions, and advice in general about how to set it up. My machine is an Intel Core2Duo E7500 @ 3GHz, 3 GB of RAM, Running Windows XP Service Pack 3. Upgrading, formatting or installing another OS is out of the question. But I do have Administrator priviledges on the PC, and I can install programs (at least for now). A lot of security software I don't even know about is and must remain installed. But I only need communication whithin the corporate network, which is not restricted. People have usernames (logins) on the corporate network, and I need to use them to restrict access. Simply put, I have a list of logins of team members, and only people in the list should access the files. I have about 150 GB of free disk space. I'm thinking of allocating 100 GB to the team's shared files. I plan monthly backups on machines of co-workers, same configuration. But automation of backups is a nice, unnecessary feature: it's totally acceptable for me to manually copy the contents to a different machine once a month. Uptime is important, as everyone would use these files in their daily work. I have experience as a python and C programmer, but no experience whatsoever as a sysadmin, and almost nothing of my programming experience is network programming. I'm a complete beginner in this. Thanks in advance for any help. EDIT I honestly appreciate all the warnings, I really do, but what I plan to make available is mostly stuff that now is solely on DVDs just for space reasons. It's 'daily work' to read them, but 'daily work write' files will remain on the corporate server. As for the importance of uptime, I think I overstated it: a few outages are OK, it's already an improvement over getting the DVDs. As for policy, my manager is kind of on my side, I will confirm that before making my move. As for getting more space through the proper channels, well, that was Plan A, and it's still on the table... But I don't have much hope. I'm not as "core businees" as I'd like.

Read the article
What the best way to achieve RPO of zero and lowest possible RTO (less than 15 minutes) with SQL 2008 R2?

- by Adrian Hope-Bailie

We are running a payments (EFT transaction processing) application which is processing high volumes of transactions 24/7 and are currently investigating a better way of doing DB replication to our disaster recovery site. Our current and previous strategies have included using both DoubleTake and Redgate to replicate data to a warm stand-by. DoubleTake is the supported solution from the payments software vendor however their (DoubleTake's) support in South Africa is very poor. We had a few issues and simply couldn't ever resolve them so we had to give up on DoubleTake. We have been using Redgate to manually read the data from the primary site (via queries) and write to the DR site but this is: A bad solution Getting the software vendor hot and bothered whenever we have support issues as it has a tendency to interfere with the payment application which is very DB intensive. We recently upgraded the whole system to run on SQL 2008 R2 Enterprise which means we should probably be looking at using some of the built-in replication features. The server has 2 fairly large databases with a mixture of tables containing highly volatile transactional data and pretty static configuration data. Replication would be done over a WAN link to a separate physical site and needs to achieve the following objectives. RPO: Zero loss - This is transactional data with financial impact so we can't lose anything. RTO: Tending to zero - The business depends on our ability to process transactions every minute we are down we are losing money I have looked at a few of the other questions/answers but none meet our case exactly: SQL Server 2008 failover strategy - Log shipping or replication? How to achieve the following RTO & RPO with logshipping only using SQL Server? What is the best of two approaches to achieve DB Replication? My current thinking is that we should use mirroring but I am concerned that for RPO:0 we will need to do delayed commits and this could impact the performance of the primary DB which is not an option. Our current DR process is to: Stop incoming traffic to the primary site and allow all in-flight transaction to complete. Allow the replication to DR to complete. Change network routing to route to DR site. Start all applications and services on the secondary site (Ideally we can change this to a warmer stand-by whereby the applications are already running but not processing any transactions). In other words the DR database needs to, as quickly as possible, catch up with primary and be ready for processing as the new primary. We would then need to be able to reverse this when we are ready to switch back. Is there a better option than mirroring (should we be doing log-shipping too) and can anyone suggest other considerations that we should keep in mind?

Read the article
Tool or script to detect moved or renamed files on Linux prior to a backup

- by Pharaun

Basically I am searching to see if there exists a tool or script that can detect moved or renamed files so that I can get a list of renamed/moved files and apply the same operation on the other end of the network to conserve on bandwidth. Basically disk storage is cheap but bandwidth isn't, and the problem is that the files often will be reorganized or moved around into a better directory structure thus when you use rsync to do the backup, rsync won't notice that its a renamed or moved file and re-transmission it over the network all over again despite having the same file on the other end. So I am wondering if there exists a script or tool that can record where all the files are and their names, then just prior to a backup, it would rescan and detect moved or renamed files, then I can take that list and re-apply the move/rename operation on the other side. Here's a list of the "general" features of the files: Large unchanging files They can be renamed or moved around [Edit:] These all are good answers, and what I end up doing in the end was looking at all of the answers and will be writing some code to deal with this. Basically what I am thinking/working on now is: Using something like AIDE for the "initial" scan and enable me to keep checksums on the files because they are supposed to never change, so it would aid on detecting corruption. Creating an inotify daemon that would monitor these files/directory and recording any changes relating to renames & moving the files around to a log file. There are some edge cases where inotify might fail to record that something happened to the file system, thus there is a final step of using find to search the file system for files that has a change time latter than the last backup. This has several benefits: Checksums/etc from AIDE to be able to check/make sure that some media did not get corrupt Inotify keeps resource usage low and no need to re-scan the filesystem over and over No need to patch rsync; If I have to patch things I can, but I would prefer to avoid patching things to keep the burden lower, (IE don't need to re-patch everytime there is an update). I've used Unison before and its really nice, however I could've sworn that Unison does keep copies around on the filesystem and that its "archive" files can grow to be rather large?

Read the article
IPv6 routing to another interface

- by Robert

I'm trying to get an IPv6 enabled router to forward data from one interface to the other and I'm having issues. When following this example (http://www.cisco.com/en/US/tech/tk872/technologies_configuration_example09186a0080ba6106.shtml) I am able to get full connectivity between all 3 routers in my simulator. However when I try to use only 1 router; I can't get connectivity to the other interfacs on the same router. My PC is directly attached to FA 0/1 and it can ping the router's interface. However it can not ping any other interface on the router(which unless I'm missing something it should be able to do). The router on the other hand can ping everything. I thought static routes might help; but the router already has routes for everything. I'm thinking the packet should come in; router looks up the destination in it's ipv6 routing table and then realizes it's for itself, and should respond. I thought maybe it couldn't respond directly; so I tried pinging a device like 2001:0000:0000:1000::2, but i don't get a response. I'm running on IOS 12.4. I'm missing something(hopefully simple), but I just can't see what it is. With only 1 router; how do I enable my PC to talk to the other subnets? Thank you in advance, Robert Topology: R1 FA 0/0: 2001:0000:0000:0000::1/52 FA 0/1: 2001:0000:0000:1000::1/52 FA 1/0: 2001:0000:0000:2000::1/52 Loopback 0: 2001:0000:0000:3000::1/52 PC: 2001:0000:0000:2000::2/52 PC plugs directly into FA 1/0 on the router. --- Configuration --- ipv6 cef ipv6 unicast routing interface Loopback0 no ip address ipv6 address 2001:0000:0000:3000::1/52 ipv6 enable ! interface FastEthernet0/0 no ip address duplex auto speed auto ipv6 address 2001:0000:0000::1/52 ipv6 enable ! interface FastEthernet0/1 no ip address duplex auto speed auto ipv6 address 2001:0000:0000:1000::1/52 ipv6 enable ! interface FastEthernet1/0 no ip address duplex auto speed auto ipv6 address 2001:0000:0000:2000::1/52 ipv6 enable --- end of config --- --- routing table --- IPV6Lab#show ipv6 route IPv6 Routing Table - 10 entries Codes: C - Connected, L - Local, S - Static, R - RIP, B - BGP U - Per-user Static route I1 - ISIS L1, I2 - ISIS L2, IA - ISIS interarea, IS - ISIS summary O - OSPF intra, OI - OSPF inter, OE1 - OSPF ext 1, OE2 - OSPF ext 2 ON1 - OSPF NSSA ext 1, ON2 - OSPF NSSA ext 2 C 2001:0000:0000::/52 [0/0] via ::, FastEthernet0/0 L 2001:0000:0000::1/128 [0/0] via ::, FastEthernet0/0 C 2001:0000:0000:1000::/52 [0/0] via ::, FastEthernet0/1 L 2001:0000:0000:1000::1/128 [0/0] via ::, FastEthernet0/1 C 2001:0000:0000:2000::/52 [0/0] via ::, FastEthernet1/0 L 2001:0000:0000:2000::1/128 [0/0] via ::, FastEthernet1/0 C 2001:0000:0000:3000::/52 [0/0] via ::, Loopback0 L 2001:0000:0000:3000::1/128 [0/0] via ::, Loopback0 L FE80::/10 [0/0] via ::, Null0 L FF00::/8 [0/0] via ::, Null0 --- end of routing table ---

Read the article
Is there a Distributed SAN/Storage System out there?

- by Joel Coel

Like many other places, we ask our users not to save files to their local machines. Instead, we encourage that they be put on a file server so that others (with appropriate permissions) can use them and that the files are backed up properly. The result of this is that most users have large hard drives that are sitting mainly empty. It's 2010 now. Surely there is a system out there that lets you turn that empty space into a virtual SAN or document library? What I envision is a client program that is pushed out to users' PCs that coordinates with a central server. The server looks to users just like a normal file server, but instead of keeping entire file contents it merely keeps a record of where those files can be found among various user PCs. It then coordinates with the right clients to serve up file requests. The client software would be able to respond to such requests directly, as well as be smart enough to cache recent files locally. For redundancy the server could make sure files are copied to multiple PCs, perhaps allowing you to define groups in different locations so that an instance of the entire repository lives in each group to protect against a disaster in one building taking down everything else. Obviously you wouldn't point your database server here, but for simpler things I see several advantages: Files can often be transferred from a nearer machine. Disk space grows automatically as your company does. Should ultimately be cheaper, as you don't need to keep a separate set of disks I can see a few downsides as well: Occasional degradation of user pc performance, if the machine has to serve or accept a large file transfer during a busy period. Writes have to be propogated around the network several times (though I suspect this isn't really much of a problem, as reading happens in most places more than writing) Still need a way to send a complete copy of the data offsite occasionally, and this would make it very hard to do differentials Think of this like a cloud storage system that lives entirely within your corporate LAN and makes use of your existing user equipment. Our old main file server is due for retirement in about 2 years, and I'm looking into replacing it with a small SAN. I'm thinking something like this would be a better fit. As a school, we have a couple computer labs I can leave running that would be perfect for adding a little extra redundancy to the system. Unfortunately, the closest thing I can find is Dienst, and it's just a paper that dates back to 1994. Am I just using the wrong buzzwords in my searches, or does this really not exist? If not, is there a big downside that I'm missing?

Read the article
joining text files with 600M+ lines

- by dnkb

I have two files huge.txt and small.txt. Huge has around 600M rows and it's 14Gigs, each line has four space separated words (tokens) and finally another space separated column with a number. Small has 150K rows with a size of ~3M, a space separated word and a number. Both Files are sorted using the sort command, with no extra options. The words in both files may include apostrophes (') and dashes (-). The deisred output would contain all columns from the huge.txt and the second column (the number) from small txt where the first word of huge.txt and the first word of small.txt match. My attemtpts below failed miserably with the following error: cat huge.txt|join -o 1.1 1.2 1.3 1.4 2.2 - small.txt > output.txt join: memory exhausted What I suspect is that the sorting order isn't right somehow even though the files are pre-sorted using: sort -k1 huge.unsorted.txt > huge.txt sort -k1 small.unsorted.txt > small.txt Problems seem to appear around words that have apostrophes (') or dashes (-). I also tried dictinoary sorting using the -d option bumping into the same error at the end. I see two ways out of this but don't know how to implement any of them. 1) Any tips how to sort the files in a way that the join command considers them to be sorted properly? 2) I was thinking of calculating MD5 or some other hashes of the strings to get rid of the apostrophes and dashes, and do the sorting and joining with the hashes instead of the strings themselves an dat the "translate" back the hashes to strings, but leave the numbers intact at the end of the lines. Since there would be only 150K hashes it's not that bad. What would be a good way to calculate individual hashes for each of the strings? Some AWK magic? See file samples at the end. Thank you! sample of huge.txt had stirred me to 46 had stirred my corruption 57 had stirred old emotions 55 had stirred something in 69 had stirred something within 40 sample of small.txt caley 114881 calf 2757974 calfed 137861 calfee 71143 calflora 154624 calfskin 148347 calgary 9416465 calgon's 94846

Read the article

Search Results

Search found 9244 results on 370 pages for 'thinking sphinx'.

Page 339/370 | < Previous Page | 335 336 337 338 339 340 341 342 343 344 345 346 | Next Page >

- by Richard Beier

- by Shiplu

- by user157145

- by automatthias

- by Ricket

- by Mr. Schwabe

- by Paul

- by TheCleaner

- by user26453

- by niktech

- by tpederson

- by Johann

- by drumfire

- by Loki

- by Rafa

- by Sholom

- by HisDudeness

- by Anon

- by Jon Watte

- by Emilio M Bumachar

- by Adrian Hope-Bailie

- by Pharaun

- by Robert

- by Joel Coel

- by dnkb

< Previous Page | 335 336 337 338 339 340 341 342 343 344 345 346 | Next Page >