Search Results

Search found 11409 results on 457 pages for 'large teams'.

Page 19/457 | < Previous Page | 15 16 17 18 19 20 21 22 23 24 25 26  | Next Page >

  • Mysql InnoDB and quickly applying large updates

    - by Tim
    Basically my problem is that I have a large table of about 17,000,000 products that I need to apply a bunch of updates to really quickly. The table has 30 columns with the id set as int(10) AUTO_INCREMENT. I have another table which all of the updates for this table are stored in, these updates have to be pre-calculated as they take a couple of days to calculate. This table is in the format of [ product_id int(10), update_value int(10) ]. The strategy I'm taking to issue these 17 million updates quickly is to load all of these updates into memory in a ruby script and group them in a hash of arrays so that each update_value is a key and each array is a list of sorted product_id's. { 150: => [1,2,3,4,5,6], 160: => [7,8,9,10] } Updates are then issued in the format of UPDATE product SET update_value = 150 WHERE product_id IN (1,2,3,4,5,6); UPDATE product SET update_value = 160 WHERE product_id IN (7,8,9,10); I'm pretty sure I'm doing this correctly in the sense that issuing the updates on sorted batches of product_id's should be the optimal way to do it with mysql / innodb. I'm hitting a weird issue though where when I was testing with updating ~13 million records, this only took around 45 minutes. Now I'm testing with more data, ~17 million records and the updates are taking closer to 120 minutes. I would have expected some sort of speed decrease here but not to the degree that I'm seeing. Any advice on how I can speed this up or what could be slowing me down with this larger record set? As far as server specs go they're pretty good, heaps of memory / cpu, the whole DB should fit into memory with plenty of room to grow.

    Read the article

  • Potential impact of large broadcast domains

    - by john
    I recently switched jobs. By the time I left my last job our network was three years old and had been planned very well (in my opinion). Our address range was split down into a bunch of VLANs with the largest subnet a /22 range. It was textbook. The company I now work for has built up their network over about 20 years. It's quite large, reaches multiple sites, and has an eclectic mix of devices. This organisation only uses VLANs for very specific things. I only know of one usage of VLANs so far and that is the SAN which also crosses a site boundary. I'm not a network engineer, I'm a support technician. But occasionally I have to do some network traces for debugging problems and I'm astounded by the quantity of broadcast traffic I see. The largest network is a straight Class B network, so it uses a /16 mask. Of course if that were filled with devices the network would likely grind to a halt. I think there are probably 2000+ physical and virtual devices currently using that subnet, but it (mostly) seems to work. This practise seems to go against everything I've been taught. My question is: In your opinion and  From my perspective - What measurement of which metric would tell me that there is too much broadcast traffic bouncing about the network? And what are the tell-tale signs that you are perhaps treading on thin ice? The way I see it, there are more and more devices being added and that can only mean more broadcast traffic, so there must be a threshold. Would things just get slower and slower, or would the effects be more subtle than that?

    Read the article

  • Objective-C vs JavaScript loop performance

    - by micadelli
    I have a PhoneGap mobile application that I need to generate an array of match combinations. In JavaScript side, the code hanged pretty soon when the array of which the combinations are generated from got a bit bigger. So, I thought I'll make a plugin to generate the combinations, passing the array of javascript objects to native side and loop it there. To my surprise the following codes executes in 150 ms (JavaScript) whereas in native side (Objective-C) it takes ~1000 ms. Does anyone know any tips for speeding up those executing times? When players exceeds 10, i.e. the length of the array of teams equals 252 it really gets slow. Those execution times mentioned above are for 10 players / 252 teams. Here's the JavaScript code: for (i = 0; i < GAME.teams.length; i += 1) { for (j = i + 1; j < GAME.teams.length; j += 1) { t1 = GAME.teams[i]; t2 = GAME.teams[j]; if ((t1.mask & t2.mask) === 0) { GAME.matches.push({ Team1: t1, Team2: t2 }); } } } ... and here's the native code: NSArray *teams = [[NSArray alloc] initWithArray: [options objectForKey:@"teams"]]; NSMutableArray *t = [[NSMutableArray alloc] init]; int mask_t1; int mask_t2; for (NSInteger i = 0; i < [teams count]; i++) { for (NSInteger j = i + 1; j < [teams count]; j++) { mask_t1 = [[[teams objectAtIndex:i] objectForKey:@"mask"] intValue]; mask_t2 = [[[teams objectAtIndex:j] objectForKey:@"mask"] intValue]; if ((mask_t1 & mask_t2) == 0) { [t insertObject:[teams objectAtIndex:i] atIndex:0]; [t insertObject:[teams objectAtIndex:j] atIndex:1]; /* NSArray *newCombination = [[NSArray alloc] initWithObjects: [teams objectAtIndex:i], [teams objectAtIndex:j], nil]; */ [combinations addObject:t]; } } } ... the array in question (GAME.teams) looks like this: { count = 2; full = 1; list = ( { index = 0; mask = 1; name = A; score = 0; }, { index = 1; mask = 2; name = B; score = 0; } ); mask = 3; name = A; }, { count = 2; full = 1; list = ( { index = 0; mask = 1; name = A; score = 0; }, { index = 2; mask = 4; name = C; score = 0; } ); mask = 5; name = A; },

    Read the article

  • Do teams get more productive by adding more developers? [duplicate]

    - by jgauffin
    This question already has an answer here: Why does adding more resource to a late project make it later? 12 answers Suppose you've got a project that is running late. Is there any proof or argument that teams become much more productive by adding more people? I am looking for answers that can be supported by facts and references if possible. What I'm thinking about is that existing devs have to teach the new ones (thus losing overall development time), and then the new developers have to study the code (and tasks) before they can become fully productive.

    Read the article

  • Bidirectional real-time sync of large file tree between two distant linux servers

    - by dlo
    By large file tree I mean about 200k files, and growing all the time. A relatively small number of files are being changed in any given hour though. By bidirectional I mean that changes may occur on either server and need to be pushed to the other, so rsync doesn't seem appropriate. By distant I mean that the servers are both in data centers, but geographically remote from each other. Currently there are only 2 servers, but that may expand over time. By real-time, it's ok for there to be a little latency between syncing, but running a cron every 1-2 minutes doesn't seem right, since a very small fraction of files may change in any given hour, let alone minute. EDIT: This is running on VPS's so I might be limited on the kinds of kernel-level stuff I can do. Also, the VPS's are not resource-rich, so I'd shy away from solutions that require lots of ram (like Gluster?). What's the best / most "accepted" approach to get this done? This seems like it would be a common need, but I haven't been able to find a generally accepted approach yet, which was surprising. (I'm seeking the safety of the masses. :) I've come across lsyncd to trigger a sync at the filesystem change level. That seems clever though not super common, and I'm a bit confused by the various lsyncd approaches. There's just using lsyncd with rsync, but it seems this could be fragile for bidirectionality since rsync doesn't have a notion of memory (eg- to know whether a deleted file on A should be deleted on B or whether it's a new file on B that should be copied to A). lipsync appears to be just a lsyncd+rsync implementation, right? Then there's using lsyncd with csync2, like this: http://www.axivo.com/community/threads/lightning-fast-synchronization-with-csync2-and-lsyncd.121/ ... I'm leaning towards this approach, but csync2 is a little quirky, though I did do a successful test of it. I'm mostly concerned that I haven't been able to find a lot of community confirmation of this method. People on here seem to like Unison a lot, but it seems that it is no longer under active development and it's not clear that it has an automatic trigger like lsyncd. I've seen Gluster mentioned, but maybe overkill for what I need? UPDATE: fyi- I ended up going with the original solution I mentioned: lsyncd+csync2. It seems to work quite well, and I like the architectural approach of having the servers be very loosely joined, so that each server can operate indefinitely on its own regardless of the link quality between them.

    Read the article

  • iptables management tools for large scale environment

    - by womble
    The environment I'm operating in is a large-scale web hosting operation (several hundred servers under management, almost-all-public addressing, etc -- so anything that talks about managing ADSL links is unlikely to work well), and we're looking for something that will be comfortable managing both the core ruleset (around 12,000 entries in iptables at current count) plus the host-based rulesets we manage for customers. Our core router ruleset changes a few times a day, and the host-based rulesets would change maybe 50 times a month (across all the servers, so maybe one change per five servers per month). We're currently using filtergen (which is balls in general, and super-balls at our scale of operation), and I've used shorewall in the past at other jobs (which would be preferable to filtergen, but I figure there's got to be something out there that's better than that). The "musts" we've come up with for any replacement system are: Must generate a ruleset fairly quickly (a filtergen run on our ruleset takes 15-20 minutes; this is just insane) -- this is related to the next point: Must generate an iptables-restore style file and load that in one hit, not call iptables for every rule insert Must not take down the firewall for an extended period while the ruleset reloads (again, this is a consequence of the above point) Must support IPv6 (we aren't deploying anything new that isn't IPv6 compatible) Must be DFSG-free Must use plain-text configuration files (as we run everything through revision control, and using standard Unix text-manipulation tools are our SOP) Must support both RedHat and Debian (packaged preferred, but at the very least mustn't be overtly hostile to either distro's standards) Must support the ability to run arbitrary iptables commands to support features that aren't part of the system's "native language" Anything that doesn't meet all these criteria will not be considered. The following are our "nice to haves": Should support config file "fragments" (that is, you can drop a pile of files in a directory and say to the firewall "include everything in this directory in the ruleset"; we use configuration management extensively and would like to use this feature to provide service-specific rules automatically) Should support raw tables Should allow you to specify particular ICMP in both incoming packets and REJECT rules Should gracefully support hostnames that resolve to more than one IP address (we've been caught by this one a few times with filtergen; it's a rather royal pain in the butt) The more optional/weird iptables features that the tool supports (either natively or via existing or easily-writable plugins) the better. We use strange features of iptables now and then, and the more of those that "just work", the better for everyone.

    Read the article

  • Windows 2003 Storage Server Hanging on Large File Transfers

    - by user25272
    In one of our offices we have a Dell PowerVault 745N NAS device which acts as the main file server. Its running 32bit Windows 2003 Storage Server SP2 with 3GB RAM. The server holds around 60 users HOME folders, which are mapped via AD. The office clients are a mix of XP SP3, Vista and Windows 7. Occasionally the server will completely hang when transferring large files. When the hang happens the console becomes unresponsive with only the mouse active and blank wallpaper. Sometimes stopping the copy frees the server, sometimes not. The hanging can last around 20 minutes. During this time other servers also become unresponsive with blank wallpaper at the console. If you do manage to get onto another server the taskbar and run commands are unresponsive. This also transcends to the client computers sometimes with explorer crashing. I'm guessing this is due to the HOME folder mapping. Eventually the NAS server with free up and everything will be back to normal. The server is configured as follows: PERC 4/DC DATA 2 - 12 SCSI HDD - RAID5 SHADOWCOPY 2 SCSI HDD - RAID1 CERC SATA DATA 11 4 SATA HDD - RAID5 OS 4 SATA HDD - RAID5 All the drivers and firmware is up to date. I've been through all the diagnostics with Dell and the hardware has come up clean including full HDD tests on the arrays. The server has NOD32 installed as the AV, but the hanging happens when it is uninstalled. There are no errors in the event log when this happens and we don't have any errors logged on any of our ProCurve switches. DNS is fine on the domain and AD from what I can tell is running happily. There are no DFS or NFS shares setup either. All the shares are standard Windows. I've unchecked the allow the computer to turn off this device to save power box under Power Management on the NIC. "Set Link Speed and Duplex to Auto-negotiate 1000 " Increased Receive Descriptors buffer from 256 to 352 (reserves more CPU resource for handling data) I've run network traces using network monitor and have found the following: 417 8.078125 {SMB:192, NbtSS:25, TCP:24, IPv4:23} 192.168.2.244 192.168.5.35 SMB SMB:R; Nt Create Andx - NT Status: System - Error, Code = (52) STATUS_OBJECT_NAME_NOT_FOUND I've tried different cabling; NICs and switch ports all with the same result. Transferring files from other servers on the domain is fine. All I haven't done is run CHKDSK on the drives to look for any file system errors. On the Vista clients I have also run netsh interface tcp set global autotuning=disabled with no result. Could it be that the server has a faulty drive or that the I/O is too much for it to handle? Any ideas why would the hang cause issues with the other servers on the LAN? Many Thanks.

    Read the article

  • Cygwin's RSYNC for large data transfer

    - by Tim Brigham
    I'm using rsync from Cygwin to do a large scale data transfer from an aging HP MSA 1000 to a new DAS attached to a different server. I have a daemon running on the remote server in read only mode and a local copy writing the files to disk. One of my servers is an image repository with over a million files spread across about 300 directories. Each file averages only a couple hundred kilobytes. More so than any other box this one is proving problematic. The rsync process will work for a while - some times 20 minutes, some times an hour - and then it simply quits and sits idle at a given file name. I have verified that the file isn't corrupt on the remote server and that the file is successfully created on the local drive. I ran the rsync client in -vv mode, which returns nothing. I checked out the logs created by the daemon. I looked at the network utilization on the interface, which is sitting idle. I looked at the AV settings to see if anything could pose a problem there. I even updated to the latest release of Cygwin. What do I need to in order to keep this connection up? EDIT: The client system is using the command rsync.exe server::Drives/f/Repo/ /cygdrive/T/Repo --archive -P -vv The server is using the command rsync.exe --daemon --no-detach --config "rsyncd.conf" The contents of rsyncd.conf: use chroot = false strict modes = false hosts allow = 192.168.100.9 log file = c:/rsyncd.log uid=0 gid=0 [Drives] path = /cygdrive read only = yes EDIT: The file server is 2003, the disk type on the array is GPT and the size is of the array is about 4 TB. EDIT: Stranger.. It looks like the process is reliably erroring out at about 175,000 files. Rsync runs fine when I pick the same directory it has problems with one at a time. EDIT: rsync version 3.0.9 protocol version 30 Copyright (C) 1996-2011 by Andrew Tridgell, Wayne Davison, and others. Web site: http://rsync.samba.org/ Capabilities: 64-bit files, 64-bit inums, 32-bit timestamps, 64-bit long ints, no socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace, append, ACLs, xattrs, iconv, symtimes A similar failure occurred when going from the same set of files with Cygwin to a Linux install. It didn't happen until several hours later than normal however.

    Read the article

  • OS X server large scale storage and backup

    - by user135217
    I really hope this question doesn't come across as trolling or asking for buying advice. It's not intended. I've just started working for a small ad agency (40 employees). I actually quit being a system administrator a few years ago (too stressful!), but the company we're currently outsourcing our IT stuff to is doing such a bad job that I've felt compelled to get involved and do what I can to improve things. At the moment, all the company's data is stored on an 8TB external firewire drive attached to a Mac Mini running OS X Server 10.6, which provides filesharing (using AFP) for the whole company. There is a single backup drive, which is actually a caddy containing two 3TB hard drives arranged in RAID 0 (arrggghhhh!), which someone brings in as and when and copies over all the data using Carbon Copy Cloner. That's the entirety of the infrastructure, and the whole backup and restore strategy. I've been having sleepless nights. I've just started augmenting the backup process with FreeBSD, ZFS, sparse bundles and snapshot sends to get everything offsite. I think this is a workable behind the scenes solution, but for people's day to day use I'm struggling. Given the quantity and importance of the data, I think we should really be looking towards enterprise level storage solutions, high availability and so on, but the whole company is all Mac all the time, and I cannot find equipment that will do what we need. No more Xserve; no rack storage; no large scale storage at all apart from that Pegasus R6 that doesn't seem all that great; the Mac Pro has fibre channel, but it's not a real server and it's ludicrously expensive; Xsan looks like it's on the way out; things like heartbeatd and failoverd have apparently been removed from Lion Server; the new Mac Mini only has thunderbolt which severely limits our choices; the list goes on and on. I'm really, really not trying to troll here. I love Macs, but I just genuinely don't know where I'm supposed to look for server stuff. I have considered Linux or FreeBSD and netatalk for serving files with all the server-y goodness those OSes bring, but some the things I've read make me wonder if it's really the way to go. Also, in my own (admittedly quite cursory) experiments with it, I've struggled to get decent transfer speeds. I guess there's also the possibility of switching everyone off AFP and making them use SMB or NFS, but I understand that this can cause big problems with resource forks and file locks. I figure there must be plenty of all Mac companies out there. If you're the sysadmin at one, what do you use? Any suggestions very gratefully received.

    Read the article

  • Hyper-V Ubuntu Networking Problems Copying Large Amounts of Data

    - by Anonymous
    I am trying to copy a large amount (about 50 GB) of data over my network from a Hyper-V-hosted virtual machine running Ubuntu 11.04 (Natty Narwhal) to another (non-virtual) Ubuntu host that I plan to use for testing upgrades to one of our web applications. The problem I am having is with the virtual machine, which I shall refer to in what follows as "source.host". This machine is running 64-bit Ubuntu Server with the 2.6.38-8-server kernel and the Microsoft Linux Integration Components for Hyper-V kernel modules (hv_utils, hv_timesource, hv_netvsc, hv_blkvsc, hv_storvsc, and hv_vmbus) loaded. It uses a Hyper-V "synthetic network adapter" for its networking interface. To do the copy, I log on to the machine with the data and run the following commands (Call the remote machine "destination.host".): $ cd /path/to/data $ tar -cvf - datafolder/ | ssh [email protected] "cat > ~/data.tar" This runs for a while and then suddenly stops after transferring somewhere from 2-6 GB. The terminal on the source.host machine displays a Write failed: broken pipe error. The odd part is this: after this occurs, the "source.host" machine is no longer able to talk to the rest of the network. I cannot ping any other hosts on the network from the "source.host" machine, and I cannot ping the "source.host" machine from any other host on the network. I am equally unable to access the any of the web services hosted on "source.host". Running ifconfig on "source.host" shows the network adapter to be up and running as usual with the correct IP address and everything. I tried restarting the networking service with $ /etc/init.d/networking restart but the problem does not go away. Restarting the machine makes it capable of talking to the network again -- it can ping and be pinged by other hosts, and the web services are also accessible and usable as normal -- but attempting the copy operation again results in the same failure, requiring another restart. As an experiment, I tried replacing the tar -- ssh pipeline above with a straight scp: $ scp -r datafolder/ [email protected]:~ but to no avail Thinking that the issue might have to do with the kernel packet-send buffers filling up, I tried increasing the buffer size to 12 MB (up from the 128 KB default) with # echo 12582911 > /proc/sys/net/core/wmem_max but this also had no effect. I'm guessing at this point that it might be a problem with the Microsoft synthetic network driver, but I don't really know. Does anyone have any suggestions? Thank you very much in advance!

    Read the article

  • Find the "largest" dense sub matrix in a large sparse matrix

    - by BCS
    Given a large sparse matrix (say 10k+ by 1M+) I need to find a subset, not necessarily continuous, of the rows and columns that form a dense matrix (all non-zero elements). I want this sub matrix to be as large as possible (not the largest sum, but the largest number of elements) within some aspect ratio constraints. Are there any known exact or aproxamate solutions to this problem? A quick scan on Google seems to give a lot of close-but-not-exactly results. What terms should I be looking for? edit: Just to clarify; the sub matrix need not be continuous. In fact the row and column order is completely arbitrary so adjacency is completely irrelevant. A thought based on Chad Okere's idea Order the rows from largest count to smallest count (not necessary but might help perf) Select two rows that have a "large" overlap Add all other rows that won't reduce the overlap Record that set Add whatever row reduces the overlap by the least Repeat at #3 until the result gets to small Start over at #2 with a different starting pair Continue until you decide the result is good enough

    Read the article

  • maintaining a large list in python

    - by Oren
    I need to maintain a large list of python pickleable objects that will quickly execute the following algorithm: The list will have 4 pointers and can: Read\Write each of the 4 pointed nodes Insert new nodes before the pointed nodes Increase a pointer by 1 Pop nodes from the beginning of the list (without overlapping the pointers) Add nodes at the end of the list The list can be very large (300-400 MB), so it shouldnt be contained in the RAM. The nodes are small (100-200 bytes) but can contain various types of data. The most efficient way it can be done, in my opinion, is to implement some paging-mechanism. On the harddrive there will be some database of a linked-list of pages where in every moment up to 5 of them are loaded in the memory. On insertion, if some max-page-size was reached, the page will be splitted to 2 small pages, and on pointer increment, if a pointer is increased beyond its loaded page, the following page will be loaded instead. This is a good solution, but I don't have the time to write it, especially if I want to make it generic and implement all the python-list features. Using a SQL tables is not good either, since I'll need to implement a complex index-key mechanism. I tried ZODB and zc.blist, which implement a BTree based list that can be stored in a ZODB database file, but I don't know how to configure it so the above features will run in reasonable time. I don't need all the multi-threading\transactioning features. No one else will touch the database-file except for my single-thread program. Can anyone explain me how to configure the ZODB\zc.blist so the above features will run fast, or show me a different large-list implementation?

    Read the article

  • Google Code Jam 2010 Large DataSets Take Too Long to Submit

    - by Travis
    Hey Guys, I'm participating in the 2010 code jam and I solved two of the problems for the small data sets, but I'm not even close to solving the large data sets in the 8 minute time frame. I'm wondering if anyone out there has solved the large data set: What hardware were you running on? What language were you running on? What performance tuning techniques did you do on your code to run as fast as possible? I'm writing the solutions in Ruby, which is not my day to day language, and executing them on my Macbook Pro. My solutions for problem A and problem C are on github at http://github.com/tjboudreaux/codejam2010. I'd appreciate any suggestions that you may have. FWIW, I have alot of experience in C++ from college, my primary language is PHP, and my "sandbox" language is Ruby. Was I just a bit ambitious by taking a shot at this in Ruby, not knowing where the language struggles for performance, or does anyone see anything that's a redflag as to why I can't complete the large dataset in time to submit.

    Read the article

  • WCF, IIS6.0 (413) Request Entity Too Large.

    - by Andrew Kalashnikov
    Hello, guys. I've got annoyed problem. I've got WCF service(basicHttpBinding with Transport security Https). This service implements contract which consists 2 methods. LoadData. GetData. GetData works OK!. My client received pachage ~2Mb size without problems. All work correctly. But when I try load data by bool LoadData(Stream data); - signature of method I'll get (413) Request Entity Too Large. Stack Trace: Server stack trace: ? ServiceModel.Channels.HttpChannelUtilities.ValidateRequestReplyResponse(HttpWebRequest request, HttpWebResponse response, HttpChannelFactory factory, WebException responseException, ChannelBinding channelBinding) System.ServiceModel.Channels.HttpChannelFactory.HttpRequestChannel.HttpChannelRequest.WaitForReply(TimeSpan timeout) System.ServiceModel.Channels.RequestChannel.Request(Message message, TimeSpan timeout) System.ServiceModel.Dispatcher.RequestChannelBinder.Request(Message message, TimeSpan timeout) I try this http://blogs.msdn.com/jiruss/archive/2007/04/13/http-413-request-entity-too-large-can-t-upload-large-files-using-iis6.aspx. But it doesn't work! My server is 2003 with IIS6.0. Please help.

    Read the article

  • Copy Small Bitmaps on to Large Bitmap with Transparency Blend: What is faster than graphics.DrawImag

    - by Glenn
    I have identified this call as a bottleneck in a high pressure function. graphics.DrawImage(smallBitmap, x , y); Is there a faster way to blend small semi transparent bitmaps into a larger semi transparent one? Example Usage: XY[] locations = GetLocs(); Bitmap[] bitmaps = GetBmps(); //small images sizes vary approx 30px x 30px using (Bitmap large = new Bitmap(500, 500, PixelFormat.Format32bppPArgb)) using (Graphics largeGraphics = Graphics.FromImage(large)) { for(var i=0; i < largeNumber; i++) { //this is the bottleneck largeGraphics.DrawImage(bitmaps[i], locations[i].x , locations[i].y); } } var done = new MemoryStream(); large.Save(done, ImageFormat.Png); done.Position = 0; return (done); The DrawImage calls take a small 32bppPArgb bitmaps and copies them into a larger bitmap at locations that vary and the small bitmaps might only partially overlap the larger bitmaps visible area. Both images have semi transparent contents that get blended by DrawImage in a way that is important to the output. I've done some testing with BitBlt but not seen significant speed improvement and the alpha blending didn't come out the same in my tests. I'm open to just about any method including a better call to bitblt or unsafe c# code.

    Read the article

  • Sorting a very large text file in Java

    - by Alice
    Hi, I have a large text file I need to sort in Java. The format is: word [tab] frequency [new line] The algorithm for sorting is: Read some of the file, filtering for purlely alphabetic words. Once you have X number of alphabetic words, call Collections.sort and write the result to a file. Repeat until you have finished reading the file. Start reading two sorted files, comparing line by line for the word with higher frequency, and writing at the same time to a new file as to not load much into your memory Repeat until all files are merged into one large file Right now I've divided the large file into smaller ones (sorted by descending frequency) with 10,000 lines each. I know I need to somehow merge these files back together, but I'm not sure how to go about this. I've created a LinkedList to keep track of all the files created. The algorithm says to compare each line in the two files, except I've tried a case where , say file1 = 8,6,5,3,1 and file2 = 9,8,8,8,8. Then if I compare them line by line I would get file3 = 9,8,8,6,8,5,8,3,8,1 which is incorrectly sorted (they should be in decreasing order). I think I'm misunderstanding some part of the algorithm. If someone could point out what I should do instead, I'd greatly appreciate it. Thanks. edit: Yes this is an assignment. We aren't allowed to increase memory unfortunately :(

    Read the article

  • What is the best way to work with large databases in Java depending on context?

    - by user19000
    We are trying to figure out the best practice for working with very large DBs in Java. What we do is a kind of BI (business Intelligence), i.e analyzing very large DBs, and using them to create intermediate DBs that represent intelligent knowledge of the DBs. We are currently using JDBC, and just preforming queries using a ResultSet. As more and more data is being created, we are wondering whether more appropriate ways exist for parsing and manipulating these large DBs: We need to support 'chunk' manipulation and not an entire DB at once(e.g. limit in JDBC, very poor performance) We do not need to be constantly connected since we are just pulling results and creating new tables of our own. We want to understand JDBC alternatives, with respect to advantages and disadvantages. Whether you think JDBC is the way to go or not, what are the best practices to go by depending on context (e.g. for large DBs queried in chunks)?

    Read the article

  • Large Object Heap Fragmentation

    - by Paul Ruane
    The C#/.NET application I am working on is suffering from a slow memory leak. I have used CDB with SOS to try to determine what is happening but the data does not seem to make any sense so I was hoping one of you may have experienced this before. The application is running on the 64 bit framework. It is continuously calculating and serialising data to a remote host and is hitting the Large Object Heap (LOH) a fair bit. However, most of the LOH objects I expect to be transient: once the calculation is complete and has been sent to the remote host, the memory should be freed. What I am seeing, however, is a large number of (live) object arrays interleaved with free blocks of memory, e.g., taking a random segment from the LOH: 0:000> !DumpHeap 000000005b5b1000 000000006351da10 Address MT Size ... 000000005d4f92e0 0000064280c7c970 16147872 000000005e45f880 00000000001661d0 1901752 Free 000000005e62fd38 00000642788d8ba8 1056 <-- 000000005e630158 00000000001661d0 5988848 Free 000000005ebe6348 00000642788d8ba8 1056 000000005ebe6768 00000000001661d0 6481336 Free 000000005f214d20 00000642788d8ba8 1056 000000005f215140 00000000001661d0 7346016 Free 000000005f9168a0 00000642788d8ba8 1056 000000005f916cc0 00000000001661d0 7611648 Free 00000000600591c0 00000642788d8ba8 1056 00000000600595e0 00000000001661d0 264808 Free ... Obviously I would expect this to be the case if my application were creating long-lived, large objects during each calculation. (It does do this and I accept there will be a degree of LOH fragmentation but that is not the problem here.) The problem is the very small (1056 byte) object arrays you can see in the above dump which I cannot see in code being created and which are remaining rooted somehow. Also note that CDB is not reporting the type when the heap segment is dumped: I am not sure if this is related or not. If I dump the marked (<--) object, CDB/SOS reports it fine: 0:015> !DumpObj 000000005e62fd38 Name: System.Object[] MethodTable: 00000642788d8ba8 EEClass: 00000642789d7660 Size: 1056(0x420) bytes Array: Rank 1, Number of elements 128, Type CLASS Element Type: System.Object Fields: None The elements of the object array are all strings and the strings are recognisable as from our application code. Also, I am unable to find their GC roots as the !GCRoot command hangs and never comes back (I have even tried leaving it overnight). So, I would very much appreciate it if anyone could shed any light as to why these small (<85k) object arrays are ending up on the LOH: what situations will .NET put a small object array in there? Also, does anyone happen to know of an alternative way of ascertaining the roots of these objects? Thanks in advance. Update 1 Another theory I came up with late yesterday is that these object arrays started out large but have been shrunk leaving the blocks of free memory that are evident in the memory dumps. What makes me suspicious is that the object arrays always appear to be 1056 bytes long (128 elements), 128 * 8 for the references and 32 bytes of overhead. The idea is that perhaps some unsafe code in a library or in the CLR is corrupting the number of elements field in the array header. Bit of a long shot I know... Update 2 Thanks to Brian Rasmussen (see accepted answer) the problem has been identified as fragmentation of the LOH caused by the string intern table! I wrote a quick test application to confirm this: static void Main() { const int ITERATIONS = 100000; for (int index = 0; index < ITERATIONS; ++index) { string str = "NonInterned" + index; Console.Out.WriteLine(str); } Console.Out.WriteLine("Continue."); Console.In.ReadLine(); for (int index = 0; index < ITERATIONS; ++index) { string str = string.Intern("Interned" + index); Console.Out.WriteLine(str); } Console.Out.WriteLine("Continue?"); Console.In.ReadLine(); } The application first creates and dereferences unique strings in a loop. This is just to prove that the memory does not leak in this scenario. Obviously it should not and it does not. In the second loop, unique strings are created and interned. This action roots them in the intern table. What I did not realise is how the intern table is represented. It appears it consists of a set of pages -- object arrays of 128 string elements -- that are created in the LOH. This is more evident in CDB/SOS: 0:000> .loadby sos mscorwks 0:000> !EEHeap -gc Number of GC Heaps: 1 generation 0 starts at 0x00f7a9b0 generation 1 starts at 0x00e79c3c generation 2 starts at 0x00b21000 ephemeral segment allocation context: none segment begin allocated size 00b20000 00b21000 010029bc 0x004e19bc(5118396) Large object heap starts at 0x01b21000 segment begin allocated size 01b20000 01b21000 01b8ade0 0x00069de0(433632) Total Size 0x54b79c(5552028) ------------------------------ GC Heap Size 0x54b79c(5552028) Taking a dump of the LOH segment reveals the pattern I saw in the leaking application: 0:000> !DumpHeap 01b21000 01b8ade0 ... 01b8a120 793040bc 528 01b8a330 00175e88 16 Free 01b8a340 793040bc 528 01b8a550 00175e88 16 Free 01b8a560 793040bc 528 01b8a770 00175e88 16 Free 01b8a780 793040bc 528 01b8a990 00175e88 16 Free 01b8a9a0 793040bc 528 01b8abb0 00175e88 16 Free 01b8abc0 793040bc 528 01b8add0 00175e88 16 Free total 1568 objects Statistics: MT Count TotalSize Class Name 00175e88 784 12544 Free 793040bc 784 421088 System.Object[] Total 1568 objects Note that the object array size is 528 (rather than 1056) because my workstation is 32 bit and the application server is 64 bit. The object arrays are still 128 elements long. So the moral to this story is to be very careful interning. If the string you are interning is not known to be a member of a finite set then your application will leak due to fragmentation of the LOH, at least in version 2 of the CLR. In our application's case, there is general code in the deserialisation code path that interns entity identifiers during unmarshalling: I now strongly suspect this is the culprit. However, the developer's intentions were obviously good as they wanted to make sure that if the same entity is deserialised multiple times then only one instance of the identifier string will be maintained in memory.

    Read the article

  • Mercurial client error 255 and HTTP error 404 when attempting to push large files to server

    - by coderunner
    Problem: When attempting to push a changeset that contains 6 large files (.exe, .dmg, etc) to my remote server my client (MacHG) is reporting the error: "Error During Push. Mercurial reported error number 255: abort: HTTP Error 404: Not Found" What does the error even mean?! The only thing unique (that I can tell) about this commit is the size, type, and filenames of the files. How can I determine which exact file within the changeset is failing? How can I delete the corrupt changeset from the repository? Someone reported using "mq" extensions, but it looks overly complicated for what I'm trying to achieve. Background: I can push and pull the following: source files, directories, .class files and a .jar file to and from the server, using both MacHG and toirtoise HG. I successfully committed to my local repository the addition for the first time the 6 large .exe, .dmg etc installer files (about 130Mb total). In the following commit to my local repository, I removed ("untracked" / forget) the 6 files causing the problem, however the previous (failing) changeset is still queued to be pushed to the server (i.e. my local host is trying to push the "add" and then the "remove" to the remote server - and keep aligned with the "keep everything in history" philosophy of the source control system). I can commit .txt .java files etc using TortoiseHG from Windows PCs. I haven't actually testing committing or pushing the same large files using TortoiseHG. Please help! Setup: Client applications = MacHG v0.9.7 (SCM 1.5.4), and TortoiseHG v1.0.4 (SCM 1.5.4) Server = HTTPS, IIS7.5, Mercurial 1.5.4, Python 2.6.5, setup using these instructions: http://www.jeremyskinner.co.uk/mercurial-on-iis7/ In IIS7.5 the CGI handler is configured to handle ALL verbs (not just GET, POST and HEAD). My hgweb.cgi file on the server is as follows: #!/usr/bin/env python # # An example hgweb CGI script, edit as necessary # Path to repo or hgweb config to serve (see 'hg help hgweb') #config = "/path/to/repo/or/config" # Uncomment and adjust if Mercurial is not installed system-wide: #import sys; sys.path.insert(0, "/path/to/python/lib") # Uncomment to send python tracebacks to the browser if an error occurs: #import cgitb; cgitb.enable() from mercurial import demandimport; demandimport.enable() from mercurial.hgweb import hgweb, wsgicgi application = hgweb('C:\inetpub\wwwroot\hg\hgweb.config') wsgicgi.launch(application) My hgweb.config file on the server is as follows: [collections] C:\Mercurial Repositories = C:\Mercurial Repositories [web] baseurl = /hg allow_push = usernamea allow_push = usernameb

    Read the article

  • Interpolating Large Datasets On the Fly

    - by Karl
    Interpolating Large Datasets I have a large data set of about 0.5million records representing the exchange rate between the USD / GBP over the course of a given day. I have an application that wants to be able to graph this data or maybe a subset. For obvious reasons I do not want to plot 0.5 million points on my graph. What I need is a smaller data set (100 points or so) which accurately (as possible) represents the given data. Does anyone know of any interesting and performant ways this data can be achieved? Cheers, Karl

    Read the article

  • Optimizing performance of large ASP.NET applications

    - by NLV
    Hello, I'm building a asp.net web application with lots and lots of controls and huge volumes of data. My application is very slow and it is taking a large amount of time to load the data into the .net controls like grid, tree view etc. I also have some ajaxified pages and controls in my application. I want to reduce the page load time in each postbacks. What are the standards/best practices to be followed while developing large asp.net applications? Thank you. NLV

    Read the article

  • 2 large databases - worth merging into 1?

    - by Ardman
    I have 2 large databases that were sharded before. I now have removed the sharding and have created a new database with all of the data except for the tables that were originally sharded. Is it worth importing this data into the new database, or keeping them as seperate entities that I can just scan through? We are talking around 60million records in each sharded table, of which there are 2 tables. Also, whilst I have an empty table, should I be adding indexes which weren't thought of when the database was originally constructed and now too large to add them?

    Read the article

  • Large number array compression

    - by gatapia
    Hi All, I've got a javascript application that sends a large amount of numerical data down the wire. This data is then stored in a database. I am having size issues (too much bandwidth, database getting too big). I am now ready to sacrifice some performance for compression. I was thinking of implementing a base 62 number.toString(62) and parseInt(compressed, 62). This would certainly reduce the size of the data but before I go ahead and do this I thought I would put it to the folks here as I know there must be some outside the box solution I have not considered. The basic specs are: - Compress large number arrays into strings for JSONP transfer (So I think UTF is out) - Be relatively fast, look I'm not expecting same performance as I have now but I also don't want gzip compression either. Any ideas would be greatly appreciated. Thanks Guido Tapia

    Read the article

  • Displaying large file in JTextArea.

    - by Sathish Gopal
    Hi All, I'm currently working in Swing UI Assignment. This work involves showing large file content in JTextArea. The file size can be as large as 2 GB. My initial idea is to lazily load content from the file, say 1 MB of content will be shown to the user. As the user scrolls i will retrieve the next 1 MB of content to be shown. All these operation will be happening in background thread (Swing Worker). I looked at the JTextArea API, the method insert takes String and int(position of the insert) as the parameter. This will suffice, but i'm worried about performance, because the content (1 MB at a time) retrieved will have to be converted to String object. Is there any other work around or any other alternative/better solution for this.

    Read the article

< Previous Page | 15 16 17 18 19 20 21 22 23 24 25 26  | Next Page >