large scale project - Page 113

jquery wait till large document is loaded

- by Martijn

In my web application I call a document can be huge. This document is loaded into an iframe. I have a title, buttons and the text which all depends on this document. The text is from the large document and is displayed in the iframe. I'd like to show an animated gif while the document is loading on 3 places (1: document title, 2: document buttons, 3: document text, the iframe) I've tried the onload event on the Iframe, but this doesn't give the me the desired effect. Here's my code that loads the document: function loadDocument(id, doc) { $("#DocumentContent").show(); $("#ButtonBox").show(); // Clear dynamic menu items $("#DynamicMenuContent").html(""); $("#PageContent").html(""); // Load document in frame $("#iframeDocument").attr("src", 'ViewDoc.aspx?id=' + id + '&doc=' + doc + ''); // $("#iframeDocument").attr("src", "Graphics/loader.gif"); // Load menu items $.ajax({ url: "ShowButtons.aspx?id=" + id + "&doc=" + doc, success: function(data) { $("#DynamicMenuContent").html(data) }, error: function(xhr, err, e) { alert("error: " + err) } }); // Set document title $("#documentTitle").load("GetDocumentInfo.aspx?p=title"); } My questions, how can I display a loader gif while the document is loaded? And remove the gif when the document is ready?

Read the article

Using ServletOutputStream to write very large files in a Java servlet without memory issues

- by Martin

I am using IBM Websphere Application Server v6 and Java 1.4 and am trying to write large CSV files to the ServletOutputStream for a user to download. Files are ranging from a 50-750MB at the moment. The smaller files aren't causing too much of a problem but with the larger files it appears that it is being written into the heap which is then causing an OutOfMemory error and bringing down the entire server. These files can only be served out to authenticated users over https which is why I am serving them through a Servlet instead of just sticking them in Apache. The code I am using is (some fluff removed around this): resp.setHeader("Content-length", "" + fileLength); resp.setContentType("application/vnd.ms-excel"); resp.setHeader("Content-Disposition","attachment; filename=\"export.csv\""); FileInputStream inputStream = null; try { inputStream = new FileInputStream(path); byte[] buffer = new byte[1024]; int bytesRead = 0; do { bytesRead = inputStream.read(buffer, offset, buffer.length); resp.getOutputStream().write(buffer, 0, bytesRead); } while (bytesRead == buffer.length); resp.getOutputStream().flush(); } finally { if(inputStream != null) inputStream.close(); } The FileInputStream doesn't seem to be causing a problem as if I write to another file or just remove the write completly the memory usage doesn't appear to be a problem. What I am thinking is that the resp.getOutputStream().write is being stored in memory until the data can be sent through to the client. So the entire file might be read and stored in the resp.getOutputStream() causing my memory issues and crashing! I have tried Buffering these streams and also tried using Channels from java.nio, none of which seems to make any bit of difference to my memory issues. I have also flushed the outputstream once per iteration of the loop and after the loop, which didn't help.

Read the article

How to delete a large cookie that causes Apache to 400

- by jakemcgraw

I've come across an issue where a web application has managed to create a cookie on the client, which, when submitted by the client to Apache, causes Apache to return the following: HTTP/1.1 400 Bad Request Date: Mon, 08 Mar 2010 21:21:21 GMT Server: Apache/2.2.3 (Red Hat) Content-Length: 7274 Connection: close Content-Type: text/html; charset=iso-8859-1 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>400 Bad Request</title> </head><body> <h1>Bad Request</h1> <p>Your browser sent a request that this server could not understand.<br /> Size of a request header field exceeds server limit.<br /> <pre> Cookie: ::: A REALLY LONG COOKIE ::: </pre> </p> <hr> <address>Apache/2.2.3 (Red Hat) Server at www.foobar.com Port 80</address> </body></html> After looking into the issue, it would appear that the web application has managed to create a really long cookie, over 7000 characters. Now, don't ask me how the web application was able to do this, I was under the impression browsers were supposed to prevent this from happening. I've managed to come up with a solution to prevent the cookies from growing out of control again. The issue I'm trying to tackle is how do I reset the large cookie on the client if every time the client tries to submit a request to Apache, Apache returns a 400 client error? I've tried using the ErrorDocument directive, but it appears that Apache bails on the request before reaching any custom error handling.

Read the article

NSKeyedArchiver on NSArray has large size overhead

- by redguy

I'm using NSKeyedArchiver in Mac OS X program which generates data for iPhone application. I found out that by default, resulting archives are much bigger than I expected. Example: NSMutableArray * ar = [NSMutableArray arrayWithCapacity:10]; for (int i = 0; i < 100000; i++) { NSString * s = [NSString stringWithFormat:@"item%06d", i]; [ar addObject:s]; } [NSKeyedArchiver archiveRootObject:ar toFile: @"NSKeyedArchiver.test"]; This stores 10 * 100000 = 1M bytes of useful data, yet the size of the resulting file is almost three megabytes. The overhead seems to be growing with number of items in the array. In this case, for 1000 items, the file was about 22k. "file" reports that it is a "Apple binary property list" (not the XML format). Is there an simple way to prevent this huge overhead? I wanted to use the NSKeyedArchiver for the simplicity it provides. I can write data to my own, non-generic, binary format, but that's not very elegant. Also, aggregating the data into large chunks and feeding these to the NSKeyedArchiver should work, but again, that kinda beats the point of using simple&easy&ready to use archiver. Am I missing some method call or usage pattern that would reduce this overhead?

Read the article

database design to speed up hibernate querying of large dataset

- by paddydub

I currently have the below tables representing a bus network mapped in hibernate, accessed from a Spring MVC based bus route planner I'm trying to make my route planner application perform faster, I load all the above tables into Lists to perform the route planner logic. I would appreciate if anyone has any ideas of how to speed my performace Or any suggestions of another method to approach this problem of handling a large set of data Coordinate Connections Table (INT,INT,INT)( Containing 50,000 Coordinate Connections) ID, FROMCOORDID, TOCOORDID 1 1 2 2 1 17 3 1 63 4 1 64 5 1 65 6 1 95 Coordinate Table (INT,DECIMAL, DECIMAL) (Containing 4700 Coordinates) ID , LAT, LNG 0 59.352669 -7.264341 1 59.352669 -7.264341 2 59.350012 -7.260653 3 59.337585 -7.189798 4 59.339221 -7.193582 5 59.341408 -7.205888 Bus Stop Table (INT, INT, INT)(Containing 15000 Stops) StopID RouteID COORDINATEID 1000100001 100 17 1000100002 100 18 1000100003 100 19 1000100004 100 20 1000100005 100 21 1000100006 100 22 1000100007 100 23 This is how long it takes to load all the data from each table: stop.findAll = 148ms, stops.size: 15670 Hibernate: select coordinate0_.COORDINATEID as COORDINA1_2_, coordinate0_.LAT as LAT2_, coordinate0_.LNG as LNG2_ from COORDINATES coordinate0_ coord.findAll = 51ms , coordinates.size: 4704 Hibernate: select coordconne0_.COORDCONNECTIONID as COORDCON1_3_, coordconne0_.DISTANCE as DISTANCE3_, coordconne0_.FROMCOORDID as FROMCOOR3_3_, coordconne0_.TOCOORDID as TOCOORDID3_ from COORDCONNECTIONS coordconne0_ coordinateConnectionDao.findAll = 238ms ; coordConnectioninates.size:48132 Hibernate Annotations @Entity @Table(name = "STOPS") public class Stop implements Serializable { @Id @GeneratedValue @Column(name = "COORDINATEID") private Integer CoordinateID; @Column(name = "LAT") private double latitude; @Column(name = "LNG") private double longitude; } @Table(name = "COORDINATES") public class Coordinate { @Id @GeneratedValue @Column(name = "COORDINATEID") private Integer CoordinateID; @Column(name = "LAT") private double latitude; @Column(name = "LNG") private double longitude; } @Entity @Table(name = "COORDCONNECTIONS") public class CoordConnection { @Id @GeneratedValue @Column(name = "COORDCONNECTIONID") private Integer CoordinateID; /** * From Coordinate_id value */ @Column(name = "FROMCOORDID", nullable = false) private int fromCoordID; /** * To Coordinate_id value */ @Column(name = "TOCOORDID", nullable = false) private int toCoordID; //private Coordinate toCoordID; }

Read the article

PostgreSQL - Why are some queries on large datasets so incredibly slow

- by Brad Mathews

Hello, I have two types of queries I run often on two large datasets. They run much slower than I would expect them to. The first type is a sequential scan updating all records: Update rcra_sites Set street = regexp_replace(street,'/','','i') rcra_sites has 700,000 records. It takes 22 minutes from pgAdmin! I wrote a vb.net function that loops through each record and sends an update query for each record (yes, 700,000 update queries!) and it runs in less than half the time. Hmmm.... The second type is a simple update with a relation and then a sequential scan: Update rcra_sites as sites Set violations='No' From narcra_monitoring as v Where sites.agencyid=v.agencyid and v.found_violation_flag='N' narcra_monitoring has 1,700,000 records. This takes 8 minutes. The query planner refuses to use my indexes. The query runs much faster if I start with a set enable_seqscan = false;. I would prefer if the query planner would do its job. I have appropriate indexes, I have vacuumed and analyzed. I optimized my shared_buffers and effective_cache_size best I know to use more memory since I have 4GB. My hardware is pretty darn good. I am running v8.4 on Windows 7. Is PostgreSQL just this slow? Or am I still missing something? Thanks! Brad

Read the article

Map large integer to a phrase

- by Alexander Gladysh

I have a large and "unique" integer (actually a SHA1 hash). I want (for no other reason than to have fun) to find an algorithm to convert that SHA1 hash to a (pseudo-)English phrase. The conversion should be reversible (i.e., knowing the algorithm, one must be able to convert the phrase back to SHA1 hash.) The possible usage of the generated phrase: the human readable version of Git commit ID, like a motto for a given program version (which is built from that commit). (As I said, this is "for fun". I don't claim that this is very practical — or be much more readable than the SHA1 itself.) A better algorithm would produce shorter, more natural-looking, more unique phrases. The phrase need not make sense. I would even settle for a whole paragraph of nonsense. (Though quality — englishness — of a paragraph should probably be better than for a mere phrase.) A variation: it is OK if I will be able to work only with a part of hash. Say, first six digits is OK. Possible approach: In the past I've attempted to build a probability table (of words), and generate phrases as Markov chains, seeding the generator (picking branches from probability tree), according to the bits I read from the SHA. This was not very successful, the resulting phrases were too long and ugly. I'm not sure if this was a bug, or the general flaw in the algorithm, since I had to abandon it early enough. Now I'm thinking about attempting to solve the problem once again. Any advice on how to approach this? Do you think Markov chain approach can work here? Something else?

Read the article

Unable to upload large files on FTP using Apache commons-net-3.1

- by Nitin

I am trying to upload the one large file ( more than 8 MB) using storeFile(remote, local) method of FTPClient but it results false.It get uploaded with some extra bytes.Following is the code with Output: public class Main { public static void main(String[] args) { FTPClient client = new FTPClient(); FileInputStream fis = null; try { client.connect("208.106.181.143"); client.setFileTransferMode(client.BINARY_FILE_TYPE); client.login("abc", "java"); int reply = client.getReplyCode(); System.out.println("Received Reply from FTP Connection:" + reply); if(FTPReply.isPositiveCompletion(reply)){ System.out.println("Connected Success"); } client.changeWorkingDirectory("/"+"Everbest"+"/"); client.makeDirectory("ETPSupplyChain5.3-EvbstSP3"); client.changeWorkingDirectory("/"+"Everbest"+"/"+"ETPSupplyChain5.3-EvbstSP3"+"/"); FTPFile[] names = client.listFiles(); String filename = "E:\\Nitin\\D-Drive\\Installer.rar"; fis = new FileInputStream(filename); boolean result = client.storeFile("Installer.rar", fis); int replyAfterupload = client.getReplyCode(); System.out.println("Received Reply from FTP Connection replyAfterupload:" + replyAfterupload); System.out.println("result:"+result); for (FTPFile name : names) { System.out.println("Name = " + name); } client.logout(); fis.close(); client.disconnect(); } catch (SocketException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } } } o/p: Received Reply from FTP Connection:230 Connected Success 32 /Everbest/ETPSupplyChain5.3-EvbstSP3 Received Reply from FTP Connection replyAfterupload:150 result:false

Read the article

Keeping the UI responsive while parsing a very large logfile

- by Carlos

I'm writing an app that parses a very large logfile, so that the user can see the contents in a treeview format. I've used a BackGroundWorker to read the file, and as it parses each message, I use a BeginInvoke to get the GUI thread to add a node to my treeview. Unfortunately, there's two issues: The treeview is unresponsive to clicks or scrolls while the file is being parsed. I would like users to be able to examine (ie expand) nodes while the file is parsing, so that they don't have to wait for the whole file to finish parsing. The treeview flickers each time a new node is added. Here's the code inside the form: private void btnChangeDir_Click(object sender, EventArgs e) { OpenFileDialog browser = new OpenFileDialog(); if (browser.ShowDialog() == DialogResult.OK) { tbSearchDir.Text = browser.FileName; BackgroundWorker bgw = new BackgroundWorker(); bgw.DoWork += (ob, evArgs) => ParseFile(tbSearchDir.Text); bgw.RunWorkerAsync(); } } private void ParseFile(string inputfile) { FileStream logFileStream = new FileStream(inputfile, FileMode.Open, FileAccess.Read, FileShare.ReadWrite); StreamReader LogsFile = new StreamReader(logFileStream); while (!LogsFile.EndOfStream) { string Msgtxt = LogsFile.ReadLine(); Message msg = new Message(Msgtxt.Substring(26)); //Reads the text into a class with appropriate members AddTreeViewNode(msg); } } private void AddTreeViewNode(Message msg) { TreeNode newNode = new TreeNode(msg.SeqNum); BeginInvoke(new Action(() => { treeView1.BeginUpdate(); treeView1.Nodes.Add(newNode); treeView1.EndUpdate(); Refresh(); } )); } What needs to be changed?

Read the article

Valueurl Binding On Large Arrays Causes Sluggish User Interface

- by Hooligancat

I have a large data set (some 3500 objects) that returns from a remote server via HTTP. Currently the data is being presented in an NSCollectionView. One aspect of the data is a path pack to the server for a small image that represents the data (think thumbnail for simplicity). Bindings works fantastically for the data that is already returned, and binding the image via a valueurl binding is easy to do. However, the user interface is very sluggish when scrolling through the data set - which makes me think that the NSCollectionView is retrieving all the image data instead of just the image data used to display the currently viewable images. I was under the impression that Cocoa controls were smart enough to only retrieve data for the information that is actually being output to the user interface through lazy loading. This certainly seems to be the case with NSTableView - but I could be misguided on this thought. Should valueurl binding act lazily and, moreover, should it act lazily in an NSCollectionView? I could create a caching mechanism (in fact I already have such a thing in place for another application - see my post here if you are interested http://stackoverflow.com/questions/1740209/populating-nsimage-with-data-from-an-asynchronous-nsurlconnection) but I really don't want to go this route if I don't have to for this specific implementation as the user could potentially change data sets often and may only want small sub-sets of the data. Any suggested approaches? Thanks!

Read the article

Creating/Maintaining a large project-agnostic code library

- by bufferz

In order to reduce repetition and streamline testing/debugging, I'm trying to find the best way to develop a group of libraries that many projects can utilize. I'd like to keep individual executable relatively small, and have shared libraries for math, database, collections, graphics, etc. that were previously scattered among several projects and in many cases duplicated (bad!). This library is to be in an SVN repo and several programmers will be working on it. This library will be in constant development along with the executables that utilize it. For example, I want a code file in ProjectA to look something like the following: using MyCompany.Math.2D; //static 2D math methods using MyCompany.Math.3D; //static #D math methods using MyCompany.Comms.SQL; //static methods for doing simple SQLDB I/O using MyCompany.Graphics.BitmapOperations; //static methods that play with bitmaps So in my ProjectA solution file in VisualStudio, in order to develop/debug the MyCompany library I have to add several projects (Math, Comms, Graphics). Things get pretty cluttered and Solution files get out of date quickly between programmer SVN commits. I'm just looking for a high level approach to maintaining a large, shared code base in an SCN repository. I am fully willing to radically redesign my approach. I'm looking for that warm fuzzy feeling you get when you're design approach is spot on and development is fluid and natural. And ideas? Thanks!!

Read the article

Evidence-Based-Scheduling - are estimations only as accurate as the work-plan they're based on?

- by Assaf Lavie

I've been using FogBugz's Evidence Based Scheduling (for the uninitiated, Joel explains) for a while now and there's an inherent problem I can't seem to work around. The system is good at telling me the probability that a given project will be delivered at some date, given the detailed list of tasks that comprise the project. However, it does not take into account the fact that during development additional tasks always pop up. Now, there's the garbage-can approach of creating a generic task/scheduled-item for "last minute hacks" or "integration tasks", or what have you, but that clearly goes against the idea of aggregating the estimates of many small cases. It's often the case that during the development stage of a project you realize that there's a whole area your planning didn't cover, because, well, that's the nature of developing stuff that hasn't been developed before. So now your ~3 month project may very well turn into a 6 month project, but not because your estimations were off (you could be the best estimator in the world, for those task the comprised your initial work plan); rather because you ended up adding a whole bunch of new tasks that weren't there to begin with. EBS doesn't help you with that. It could, theoretically (I guess). It could, perhaps, measure the amount of work you add to a project over time and take that into consideration when estimating the time remaining on a given project. Just a thought. In other words, EBS works on a task basis, but not on a project/release basis - but the latter is what's important. It's what your boss typically cares about - delivery date, not the time it takes to finish each task along the way, and not the time it would have taken, if your planning was perfect. So the question is (yes, there's a question here, don't close it): What's your methodology when it comes to using EBS in FogBugz and how do you solve the problem above, which seems to be a main cause of schedule delays and mispredictions? Edit Some more thoughts after reading a few answers: If it comes down to having to choose which delivery date you're comfortable presenting to your higher-ups by squinting at the delivery-probability graph and choosing 80%, or 95%, or 60% (based on what, exactly?) then we've resorted to plain old buffering/factoring of our estimates. In which case, couldn't we have skipped the meticulous case by case hour-sized estimation effort step? By forcing ourselves to break down tasks that take more than a day into smaller chunks of work haven't we just deluded ourselves into thinking our planning is as tight and thorough as it could be? People may be consistently bad estimators that do not even learn from their past mistakes. In that respect, having an EBS system is certainly better than not having one. But what can we do about the fact that we're not that good in planning as well? I'm not sure it's a problem that can be solved by a similar system. Our estimates are wrong because of tendencies to be overly optimistic/pessimistic about certain tasks, and because of neglect to account for systematic delays (e.g. sick days, major bug crisis) - and usually not because we lack knowledge about the work that needs to be done. Our planning, on the other hand, is often incomplete because we simply don't have enough knowledge in this early stage; and I don't see how an EBS-like system could fill that gap. So we're back to methodology. We need to find a way to accommodate bad or incomplete work plans that's better than voodoo-multiplication.

Read the article

linux new/delete, malloc/free large memory blocks

- by brian_mk

Hi folks, We have a linux system (kubuntu 7.10) that runs a number of CORBA Server processes. The server software uses glibc libraries for memory allocation. The linux PC has 4G physical memory. Swap is disabled for speed reasons. Upon receiving a request to process data, one of the server processes allocates a large data buffer (using the standard C++ operator 'new'). The buffer size varies depening upon a number of parameters but is typically around 1.2G Bytes. It can be up to about 1.9G Bytes. When the request has completed, the buffer is released using 'delete'. This works fine for several consecutive requests that allocate buffers of the same size or if the request allocates a smaller size than the previous. The memory appears to be free'd ok - otherwise buffer allocation attempts would eventually fail after just a couple of requests. In any case, we can see the buffer memory being allocated and freed for each request using tools such as KSysGuard etc. The problem arises when a request requires a buffer larger than the previous. In this case, operator 'new' throws an exception. It's as if the memory that has been free'd from the first allocation cannot be re-allocated even though there is sufficient free physical memory available. If I kill and restart the server process after the first operation, then the second request for a larger buffer size succeeds. i.e. killing the process appears to fully release the freed memory back to the system. Can anyone offer an explanation as to what might be going on here? Could it be some kind of fragmentation or mapping table size issue? I am thinking of replacing new/delete with malloc/free and use mallopt to tune the way the memory is being released to the system. BTW - I'm not sure if it's relevant to our problem, but the server uses Pthreads that get created and destroyed on each processing request. Cheers, Brian.

Read the article

Selecting a good SQL Server 2008 spatial index with large polygons

- by andynormancx

I'm having some fun trying to pick a decent SQL Server 2008 spatial index setup for a data set I am dealing with. The dataset is polygons, representing contours over the whole globe. There are 106,000 rows in the table, the polygons are stored in a geometry field. The issue I have is that many of the polygons cover a large portion of the globe. This seems to make it very hard to get a spatial index that will eliminate many rows in the primary filter. For example, look at the following query: SELECT "ID","CODE","geom".STAsBinary() as "geom" FROM "dbo"."ContA" WHERE "geom".Filter( geometry::STGeomFromText('POLYGON ((-142.03193662573682 59.53396984952896, -142.03193662573682 59.88928136451884, -141.32743833481925 59.88928136451884, -141.32743833481925 59.53396984952896, -142.03193662573682 59.53396984952896))', 4326) ) = 1 This is querying an area which intersects with only two of the polygons in the table. No matter what combination of spatial index settings I chose, that Filter() always returns around 60,000 rows. Replacing Filter() with STIntersects() of course returns just the two polygons I want, but of course takes much longer (Filter() is 6 seconds, STIntersects() is 12 seconds). Can anyone give me any hints on whether there is a spatial index setup that is likely to improve on 60,000 rows or is my dataset just not a good match for SQL Server's spatial indexing ?

Read the article

Delete Duplicate records from large csv file C# .Net

- by Sandhurst

I have created a solution which read a large csv file currently 20-30 mb in size, I have tried to delete the duplicate rows based on certain column values that the user chooses at run time using the usual technique of finding duplicate rows but its so slow that it seems the program is not working at all. What other technique can be applied to remove duplicate records from a csv file Here's the code, definitely I am doing something wrong DataTable dtCSV = ReadCsv(file, columns); //columns is a list of string List column DataTable dt=RemoveDuplicateRecords(dtCSV, columns); private DataTable RemoveDuplicateRecords(DataTable dtCSV, List<string> columns) { DataView dv = dtCSV.DefaultView; string RowFilter=string.Empty; if(dt==null) dt = dv.ToTable().Clone(); DataRow row = dtCSV.Rows[0]; foreach (DataRow row in dtCSV.Rows) { try { RowFilter = string.Empty; foreach (string column in columns) { string col = column; RowFilter += "[" + col + "]" + "='" + row[col].ToString().Replace("'","''") + "' and "; } RowFilter = RowFilter.Substring(0, RowFilter.Length - 4); dv.RowFilter = RowFilter; DataRow dr = dt.NewRow(); bool result = RowExists(dt, RowFilter); if (!result) { dr.ItemArray = dv.ToTable().Rows[0].ItemArray; dt.Rows.Add(dr); } } catch (Exception ex) { } } return dt; }

Read the article

Returning large collections from WCF Serivce

- by Nate Bross

I'm trying to determine the best approach for building a WCF Service, and the area I'm struggling with most is returning lists of objects. The built-in maxMessageSize of 64k seems pretty high, and I really don't want to bump it up (quick googling finds 100s of places bumping the maxMessageSize up to multi-gigabyte range which seems foolish). But, when I'm returning a collection of objects (~150 items) I am exceeding the default 64k. I'm almost to the point of returning my own class which inherits IEnumerable and has properties for hasNext, hasPrevious and PageSize so that I can implement paging on the client side -- this seems like alot of code. The other option is to jackup the maxMessageSize and hope for the best, but that feels wrong. All other aspects of my service are working great, its just returning large collectiosn where I'm having issues. For background, there are two types of consumers of this service, UI applications which will be primarly web and/or wpf applications, and data processing applications, .NET console apps, and maybe some other non-UI apps. For the UI applications, I would like to keep them responsive and keep the messageSize low, on the console apps it doesn't matter as much as they are just pulling data down to do processing and push it back up to the service.

Read the article

Mysql Database Question about Large Columns

- by murat

Hi, I have a table that has 100.000 rows, and soon it will be doubled. The size of the database is currently 5 gb and most of them goes to one particular column, which is a text column for PDF files. We expect to have 20-30 GB or maybe 50 gb database after couple of month and this system will be used frequently. I have couple of questions regarding with this setup 1-) We are using innodb on every table, including users table etc. Is it better to use myisam on this table, where we store text version of the PDF files? (from memory usage /performance perspective) 2-) We use Sphinx for searching, however the data must be retrieved for highlighting. Highlighting is done via sphinx API but still we need to retrieve 10 rows in order to send it to Sphinx again. This 10 rows may allocate 50 mb memory, which is quite large. So I am planning to split these PDF files into chunks of 5 pages in the database, so these 100.000 rows will be around 3-4 million rows and couple of month later, instead of having 300.000-350.000 rows, we'll have 10 million rows to store text version of these PDF files. However, we will retrieve less pages, so again instead of retrieving 400 pages to send Sphinx for highlighting, we can retrieve 5 pages and it will have a big impact on the performance. Currently, when we search a term and retrieve PDF files that have more than 100 pages, the execution time is 0.3-0.35 seconds, however if we retrieve PDF files that have less than 5 pages, the execution time reduces to 0.06 seconds, and it also uses less memory. Do you think, this is a good trade-off? We will have million of rows instead of having 100k-200k rows but it will save memory and improve the performance. Is it a good approach to solve this problem and do you have any ideas how to overcome this problem? The text version of the data is used only for indexing and highlighting. So, we are very flexible. Thanks,

Read the article

Large Y-axis tickInterval in high charts does not work

- by ckovacs

I have a chart at this JSFiddle to demonstrate a problem where our charts are not respecting the y-axis tick interval for large values: http://jsfiddle.net/z2cDu/1/ var plots = {"usBytePlots":[[1362009600000,143663192997],[1362096000000,110184848742],[1362182400000,97694974247],[1362268800000,90764690805],[1362355200000,112436517747],[1362441600000,113563368701],[1362528000000,139579327454],[1362614400000,118406594506],[1362700800000,125366899935],[1362787200000,134189435596],[1362873600000,132873135854],[1362960000000,121002328604],[1363046400000,123138222001],[1363132800000,115667785553],[1363219200000,103746172138],[1363305600000,108602633473],[1363392000000,89133998142],[1363478400000,92170701458],[1363564800000,86696922873],[1363651200000,80980159054],[1363737600000,97604615694],[1363824000000,108011666339],[1363910400000,124419138381],[1363996800000,121704988344],[1364083200000,124337959109],[1364169600000,137495512348],[1364256000000,136017103319],[1364342400000,60867510427]],"dsBytePlots":[[1362009600000,1734982247336],[1362096000000,1471928923201],[1362182400000,1453869593201],[1362268800000,1411787942581],[1362355200000,1460252447519],[1362441600000,1595590020177],[1362528000000,1658007074783],[1362614400000,1411941908699],[1362700800000,1447659369450],[1362787200000,1643008799861],[1362873600000,1792357973023],[1362960000000,1575173242169],[1363046400000,1565139003978],[1363132800000,1549211975554],[1363219200000,1438411448469],[1363305600000,1380445413578],[1363392000000,1298319283929],[1363478400000,1194578344720],[1363564800000,1211409679299],[1363651200000,1142416351471],[1363737600000,1223822672626],[1363824000000,1267692136487],[1363910400000,1384335759541],[1363996800000,1577205919828],[1364083200000,1675715948928],[1364169600000,1517593781592],[1364256000000,1562183018457],[1364342400000,681007264598]],"aggregatedTotalBytes":43476367948896,"aggregatedUsBytes":3150320403841,"aggregatedDsBytes":40326047545055,"maxTotalBytes":328186292129,"maxTotalBitsPerSecond":30387619.641574074} ; $('#container').highcharts({ yAxis: { tickInterval: 53687091200 // 500 gigabytes. Maximum y-axis value is approx 1.8TB }, series : [ { color: 'rgba(80, 180, 77, 0.7)', type: 'areaspline', name : 'Downstream', data : plots.dsBytePlots, total: plots.aggregatedDsBytes }, { color: 'rgba(33, 143, 197, 0.7)', type: 'areaspline', name : 'Upstream', data : plots.usBytePlots, total: plots.aggregatedUsBytes }] }); In this example we are charting bandwidth utilization in bytes. The chart has a maximum value of about 1.8TB. We set the y-axis tick interval to exactly 500GB but the rendered y-axis ticks don't make any sense for the given interval.

Read the article

Need a tool to search large structure text documents for words, phrases and related phrases

- by pitosalas

I have to keep up with structured documents containing things such as requests for proposals, government program reports, threat models and all kinds of things like that. They are in techno-legalese as I would call them: highly structured, with section numbering and 3, 4 and 5 levels of nesting. All in English I need a more efficient way to locate those paragraphs of nuggets that matter to me. So what I’d like is kind of a local document index/repository, that would allow me to have some standing queries and easily locate sections in documents that talk about my queries. Here’s an example: I’d like to load in 10 large PDF files, each of say 100 pages. Each PDF contains English text, formatted very nicely into paragraphs and sections. I’d like to specify that I am interested in “blogging platforms”, “weaknesses in Ruby”, “localization and internationalization” Ideally then look at a list that showed the section of text, the name of the document, and other information that seemed to be related to and/or include the words and phrases I specified. I am sure something like this exists. I would call it something like document indexing, document comprehension or structured searching.

Read the article

High Runtime for Dictionary.Add for a large amount of items

- by aaginor

Hi folks, I have a C#-Application that stores data from a TextFile in a Dictionary-Object. The amount of data to be stored can be rather large, so it takes a lot of time inserting the entries. With many items in the Dictionary it gets even worse, because of the resizing of internal array, that stores the data for the Dictionary. So I initialized the Dictionary with the amount of items that will be added, but this has no impact on speed. Here is my function: private Dictionary<IdPair, Edge> AddEdgesToExistingNodes(HashSet<NodeConnection> connections) { Dictionary<IdPair, Edge> resultSet = new Dictionary<IdPair, Edge>(connections.Count); foreach (NodeConnection con in connections) { ... resultSet.Add(nodeIdPair, newEdge); } return resultSet; } In my tests, I insert ~300k items. I checked the running time with ANTS Performance Profiler and found, that the Average time for resultSet.Add(...) doesn't change when I initialize the Dictionary with the needed size. It is the same as when I initialize the Dictionary with new Dictionary(); (about 0.256 ms on average for each Add). This is definitely caused by the amount of data in the Dictionary (ALTHOUGH I initialized it with the desired size). For the first 20k items, the average time for Add is 0.03 ms for each item. Any idea, how to make the add-operation faster? Thanks in advance, Frank

Read the article

Hibernate design to speed up querying of large dataset

- by paddydub

I currently have the below tables representing a bus network mapped in hibernate, accessed from a Spring MVC based bus route planner I'm trying to make my route planner application perform faster, I load all the above tables into Lists to perform the route planner logic. I would appreciate if anyone has any ideas of how to speed my performace Or any suggestions of another method to approach this problem of handling a large set of data Coordinate Connections Table (INT,INT,INT, DOUBLE)( Containing 50,000 Coordinate Connections) ID, FROMCOORDID, TOCOORDID, DISTANCE 1 1 2 0.383657 2 1 17 0.173201 3 1 63 0.258781 4 1 64 0.013726 5 1 65 0.459829 6 1 95 0.458769 Coordinate Table (INT,DECIMAL, DECIMAL) (Containing 4700 Coordinates) ID , LAT, LNG 0 59.352669 -7.264341 1 59.352669 -7.264341 2 59.350012 -7.260653 3 59.337585 -7.189798 4 59.339221 -7.193582 5 59.341408 -7.205888 Bus Stop Table (INT, INT, INT)(Containing 15000 Stops) StopID RouteID COORDINATEID 1000100001 100 17 1000100002 100 18 1000100003 100 19 1000100004 100 20 1000100005 100 21 1000100006 100 22 1000100007 100 23 This is how long it takes to load all the data from each table: stop.findAll = 148ms, stops.size: 15670 Hibernate: select coordinate0_.COORDINATEID as COORDINA1_2_, coordinate0_.LAT as LAT2_, coordinate0_.LNG as LNG2_ from COORDINATES coordinate0_ coord.findAll = 51ms , coordinates.size: 4704 Hibernate: select coordconne0_.COORDCONNECTIONID as COORDCON1_3_, coordconne0_.DISTANCE as DISTANCE3_, coordconne0_.FROMCOORDID as FROMCOOR3_3_, coordconne0_.TOCOORDID as TOCOORDID3_ from COORDCONNECTIONS coordconne0_ coordinateConnectionDao.findAll = 238ms ; coordConnectioninates.size:48132 Hibernate Annotations @Entity @Table(name = "STOPS") public class Stop implements Serializable { @Id @GeneratedValue(strategy = GenerationType.AUTO) @Column(name = "STOPID") private int stopID; @Column(name = "ROUTEID", nullable = false) private int routeID; @ManyToOne(fetch = FetchType.LAZY) @JoinColumn(name = "COORDINATEID", nullable = false) private Coordinate coordinate; } @Table(name = "COORDINATES") public class Coordinate { @Id @GeneratedValue @Column(name = "COORDINATEID") private int CoordinateID; @Column(name = "LAT") private double latitude; @Column(name = "LNG") private double longitude; } @Entity @Table(name = "COORDCONNECTIONS") public class CoordConnection { @Id @GeneratedValue @Column(name = "COORDCONNECTIONID") private int CoordinateID; @ManyToOne(fetch = FetchType.LAZY) @JoinColumn(name = "FROMCOORDID", nullable = false) private Coordinate fromCoordID; @ManyToOne(fetch = FetchType.LAZY) @JoinColumn(name = "TOCOORDID", nullable = false) private Coordinate toCoordID; @Column(name = "DISTANCE", nullable = false) private double distance; }

Read the article

Large flags enumerations in C#

- by LorenVS

Hey everyone, got a quick question that I can't seem to find anything about... I'm working on a project that requires flag enumerations with a large number of flags (up to 40-ish), and I don't really feel like typing in the exact mask for each enumeration value: public enum MyEnumeration : ulong { Flag1 = 1, Flag2 = 2, Flag3 = 4, Flag4 = 8, Flag5 = 16, // ... Flag16 = 65536, Flag17 = 65536 * 2, Flag18 = 65536 * 4, Flag19 = 65536 * 8, // ... Flag32 = 65536 * 65536, Flag33 = 65536 * 65536 * 2 // right about here I start to get really pissed off } Moreover, I'm also hoping that there is an easy(ier) way for me to control the actual arrangement of bits on different endian machines, since these values will eventually be serialized over a network: public enum MyEnumeration : uint { Flag1 = 1, // BIG: 0x00000001, LITTLE:0x01000000 Flag2 = 2, // BIG: 0x00000002, LITTLE:0x02000000 Flag3 = 4, // BIG: 0x00000004, LITTLE:0x03000000 // ... Flag9 = 256, // BIG: 0x00000010, LITTLE:0x10000000 Flag10 = 512, // BIG: 0x00000011, LITTLE:0x11000000 Flag11 = 1024 // BIG: 0x00000012, LITTLE:0x12000000 } So, I'm kind of wondering if there is some cool way I can set my enumerations up like: public enum MyEnumeration : uint { Flag1 = flag(1), // BOTH: 0x80000000 Flag2 = flag(2), // BOTH: 0x40000000 Flag3 = flag(3), // BOTH: 0x20000000 // ... Flag9 = flag(9), // BOTH: 0x00800000 } What I've Tried: // this won't work because Math.Pow returns double // and because C# requires constants for enum values public enum MyEnumeration : uint { Flag1 = Math.Pow(2, 0), Flag2 = Math.Pow(2, 1) } // this won't work because C# requires constants for enum values public enum MyEnumeration : uint { Flag1 = Masks.MyCustomerBitmaskGeneratingFunction(0) } // this is my best solution so far, but is definitely // quite clunkie public struct EnumWrapper<TEnum> where TEnum { private BitVector32 vector; public bool this[TEnum index] { // returns whether the index-th bit is set in vector } // all sorts of overriding using TEnum as args } Just wondering if anyone has any cool ideas, thanks!

Read the article

Good PHP / MYSQL hashing solution for large number of text values

- by Dave

Short descriptio: Need hashing algorithm solution in php for large number of text values. Long description. PRODUCT_OWNER_TABLE serial_number (auto_inc), product_name, owner_id OWNER_TABLE owner_id (auto_inc), owener_name I need to maintain a database of 200000 unique products and their owners (AND all subsequent changes to ownership). Each product has one owner, but an owner may have MANY different products. Owner names are "Adam Smith", "John Reeves", etc, just text values (quite likely to be unicode as well). I want to optimize the database design, so what i was thinking was, every week when i run this script, it fetchs the owner of a proudct, then checks against a table i suppose similar to PRODUCT_OWNER_TABLE, fetching the owner_id. It then looks up owner_id in OWNER_TABLE. If it matches, then its the same, so it moves on. The problem is when its different... To optimize the database, i think i should be checking against the other "owner_name" entries in OWNER_TABLE to see if that value exists there. If it does, then i should use that owner_id. If it doesnt, then i should add another entry. Note that there is nothing special about the "name". as long as i maintain the correct linkagaes AND make the OWNER_TABLE "read-only, append-new" type table - I should be able create a historical archive of ownership. I need to do this check for 200000 entries, with i dont know how many unique owner names (~50000?). I think i need a hashing solution - the OWNER_TABLE wont be sorted, so search algos wont be optimal. programming language is PHP. database is MYSQL.

Read the article

Filter large amounts of data in a table w/ jQuery

- by Bry4n

I work for a transit agency and I have large amounts of data (mostly times), and I need a way to filter the data using two textboxes (To and From). I found jQuery quick search, but it seems to only work with one textbox. If anyone has any ideas via jQuery or some other client side library, that would be fantastic. Ideal example: To: [Textbox] From:[Textbox] <table> <tr> <td>69th street</td><td>5:00pm</td><td>5:06pm</td><td>5:10pm</td><td>5:20pm</td> </tr> <tr> <td>Millbourne</td><td>5:09pm</td><td>5:15pm</td><td>5:20pm</td><td>5:25pm</td> </tr> <tr> <td>Spring Garden</td><td>6:00pm</td><td>6:15pm</td><td>6:20pm</td><td>6:25pm</td> </tr> </table> So If I start typing in one of the stations in the To: textbox it either displays dynamically like the quick search or i have to press a button (either or) and then in the from: textbox. Lastly it shows me to: station and all its times on the left and the from: station and all its times on the right.

Read the article

table design for storing large number of rows

- by hyperboreean

I am trying to store in a postgresql database some unique identifiers along with the site they have been seen on. I can't really decide which of the following 3 option to choose in order to be faster and easy maintainable. The table would have to provide the following information: the unique identifier which unfortunately it's text the sites on which that unique identifier has been seen The amount of data that would have to hold is rather large: there are around 22 millions unique identifiers that I know of. So I thought about the following designs of the table: id - integer identifier - text seen_on_site - an integer, foreign key to a sites table This approach would require around 22 mil multiplied by the number of sites. id - integer identifier - text seen_on_site_1 - boolean seen_on_site_2 - boolean ............ seen_on_site_n - boolean Hopefully the number of sites won't go past 10. This would require only the number of unique identifiers that I know of, that is around 20 millions, but it would make it hard to work with it from an ORM perspective. one table that would store only unique identifiers, like in: id - integer unique_identifier - text, one table that would store only sites, like in: id - integer site - text and one many to many relation, like: id - integer, unique_id - integer (fk to the table storing identifiers) site_id - integer (fk to sites table) another approach would be to have a table that stores unique identifiers for each site So, which one seems like a better approach to take on the long run?

Search Results

Search found 38088 results on 1524 pages for 'large scale project'.

Page 113/1524 | < Previous Page | 109 110 111 112 113 114 115 116 117 118 119 120 | Next Page >

- by Martijn

- by Martin

- by jakemcgraw

- by redguy

- by paddydub

- by Brad Mathews

- by Alexander Gladysh

- by Nitin

- by Carlos

- by Hooligancat

- by bufferz

- by Assaf Lavie

- by brian_mk

- by andynormancx

- by Sandhurst

- by Nate Bross

- by murat

- by ckovacs

- by pitosalas

- by aaginor

- by paddydub

- by LorenVS

- by Dave

- by Bry4n

- by hyperboreean

< Previous Page | 109 110 111 112 113 114 115 116 117 118 119 120 | Next Page >