Search Results

Search found 65999 results on 2640 pages for 'large data volumes'.

Page 142/2640 | < Previous Page | 138 139 140 141 142 143 144 145 146 147 148 149 | Next Page >

Mysql Database Question about Large Columns

- by murat

Hi, I have a table that has 100.000 rows, and soon it will be doubled. The size of the database is currently 5 gb and most of them goes to one particular column, which is a text column for PDF files. We expect to have 20-30 GB or maybe 50 gb database after couple of month and this system will be used frequently. I have couple of questions regarding with this setup 1-) We are using innodb on every table, including users table etc. Is it better to use myisam on this table, where we store text version of the PDF files? (from memory usage /performance perspective) 2-) We use Sphinx for searching, however the data must be retrieved for highlighting. Highlighting is done via sphinx API but still we need to retrieve 10 rows in order to send it to Sphinx again. This 10 rows may allocate 50 mb memory, which is quite large. So I am planning to split these PDF files into chunks of 5 pages in the database, so these 100.000 rows will be around 3-4 million rows and couple of month later, instead of having 300.000-350.000 rows, we'll have 10 million rows to store text version of these PDF files. However, we will retrieve less pages, so again instead of retrieving 400 pages to send Sphinx for highlighting, we can retrieve 5 pages and it will have a big impact on the performance. Currently, when we search a term and retrieve PDF files that have more than 100 pages, the execution time is 0.3-0.35 seconds, however if we retrieve PDF files that have less than 5 pages, the execution time reduces to 0.06 seconds, and it also uses less memory. Do you think, this is a good trade-off? We will have million of rows instead of having 100k-200k rows but it will save memory and improve the performance. Is it a good approach to solve this problem and do you have any ideas how to overcome this problem? The text version of the data is used only for indexing and highlighting. So, we are very flexible. Thanks,

Read the article
Large Y-axis tickInterval in high charts does not work

- by ckovacs

I have a chart at this JSFiddle to demonstrate a problem where our charts are not respecting the y-axis tick interval for large values: http://jsfiddle.net/z2cDu/1/ var plots = {"usBytePlots":[[1362009600000,143663192997],[1362096000000,110184848742],[1362182400000,97694974247],[1362268800000,90764690805],[1362355200000,112436517747],[1362441600000,113563368701],[1362528000000,139579327454],[1362614400000,118406594506],[1362700800000,125366899935],[1362787200000,134189435596],[1362873600000,132873135854],[1362960000000,121002328604],[1363046400000,123138222001],[1363132800000,115667785553],[1363219200000,103746172138],[1363305600000,108602633473],[1363392000000,89133998142],[1363478400000,92170701458],[1363564800000,86696922873],[1363651200000,80980159054],[1363737600000,97604615694],[1363824000000,108011666339],[1363910400000,124419138381],[1363996800000,121704988344],[1364083200000,124337959109],[1364169600000,137495512348],[1364256000000,136017103319],[1364342400000,60867510427]],"dsBytePlots":[[1362009600000,1734982247336],[1362096000000,1471928923201],[1362182400000,1453869593201],[1362268800000,1411787942581],[1362355200000,1460252447519],[1362441600000,1595590020177],[1362528000000,1658007074783],[1362614400000,1411941908699],[1362700800000,1447659369450],[1362787200000,1643008799861],[1362873600000,1792357973023],[1362960000000,1575173242169],[1363046400000,1565139003978],[1363132800000,1549211975554],[1363219200000,1438411448469],[1363305600000,1380445413578],[1363392000000,1298319283929],[1363478400000,1194578344720],[1363564800000,1211409679299],[1363651200000,1142416351471],[1363737600000,1223822672626],[1363824000000,1267692136487],[1363910400000,1384335759541],[1363996800000,1577205919828],[1364083200000,1675715948928],[1364169600000,1517593781592],[1364256000000,1562183018457],[1364342400000,681007264598]],"aggregatedTotalBytes":43476367948896,"aggregatedUsBytes":3150320403841,"aggregatedDsBytes":40326047545055,"maxTotalBytes":328186292129,"maxTotalBitsPerSecond":30387619.641574074} ; $('#container').highcharts({ yAxis: { tickInterval: 53687091200 // 500 gigabytes. Maximum y-axis value is approx 1.8TB }, series : [ { color: 'rgba(80, 180, 77, 0.7)', type: 'areaspline', name : 'Downstream', data : plots.dsBytePlots, total: plots.aggregatedDsBytes }, { color: 'rgba(33, 143, 197, 0.7)', type: 'areaspline', name : 'Upstream', data : plots.usBytePlots, total: plots.aggregatedUsBytes }] }); In this example we are charting bandwidth utilization in bytes. The chart has a maximum value of about 1.8TB. We set the y-axis tick interval to exactly 500GB but the rendered y-axis ticks don't make any sense for the given interval.

Read the article
Need a tool to search large structure text documents for words, phrases and related phrases

- by pitosalas

I have to keep up with structured documents containing things such as requests for proposals, government program reports, threat models and all kinds of things like that. They are in techno-legalese as I would call them: highly structured, with section numbering and 3, 4 and 5 levels of nesting. All in English I need a more efficient way to locate those paragraphs of nuggets that matter to me. So what I’d like is kind of a local document index/repository, that would allow me to have some standing queries and easily locate sections in documents that talk about my queries. Here’s an example: I’d like to load in 10 large PDF files, each of say 100 pages. Each PDF contains English text, formatted very nicely into paragraphs and sections. I’d like to specify that I am interested in “blogging platforms”, “weaknesses in Ruby”, “localization and internationalization” Ideally then look at a list that showed the section of text, the name of the document, and other information that seemed to be related to and/or include the words and phrases I specified. I am sure something like this exists. I would call it something like document indexing, document comprehension or structured searching.

Read the article
How do I populate a MEF plugin with data that is not hard coded into the assembly?

- by hjoelr

I am working on a program that will communicate with various pieces of hardware. Because of the varied nature of the items it communicates and controls, I need to have a different "driver" for each different piece of hardware. This made me think that MEF would be a great way to make those drivers as plugins that can be added even after the product has been released. I've looked at a lot of examples of how to use MEF, but the question that I haven't been able to find an answer to is how to populate a MEF plugin with external data (eg. from a database). All the examples I can find have the "data" hard-coded into the assembly, like the following example: [Export( typeof( IRule ) )] public class RuleInstance : IRule { public void DoIt() {} public string Name { get { return "Rule Instance 3"; } } public string Version { get { return "1.1.0.0"; } } public string Description { get { return "Some Rule Instance"; } } } What if I want Name, Version and Description to come from a database? How would I tell MEF where to get that information? I may be missing something very obvious, but I don't know what it is. Thanks! Joel

Read the article
Hibernate design to speed up querying of large dataset

- by paddydub

I currently have the below tables representing a bus network mapped in hibernate, accessed from a Spring MVC based bus route planner I'm trying to make my route planner application perform faster, I load all the above tables into Lists to perform the route planner logic. I would appreciate if anyone has any ideas of how to speed my performace Or any suggestions of another method to approach this problem of handling a large set of data Coordinate Connections Table (INT,INT,INT, DOUBLE)( Containing 50,000 Coordinate Connections) ID, FROMCOORDID, TOCOORDID, DISTANCE 1 1 2 0.383657 2 1 17 0.173201 3 1 63 0.258781 4 1 64 0.013726 5 1 65 0.459829 6 1 95 0.458769 Coordinate Table (INT,DECIMAL, DECIMAL) (Containing 4700 Coordinates) ID , LAT, LNG 0 59.352669 -7.264341 1 59.352669 -7.264341 2 59.350012 -7.260653 3 59.337585 -7.189798 4 59.339221 -7.193582 5 59.341408 -7.205888 Bus Stop Table (INT, INT, INT)(Containing 15000 Stops) StopID RouteID COORDINATEID 1000100001 100 17 1000100002 100 18 1000100003 100 19 1000100004 100 20 1000100005 100 21 1000100006 100 22 1000100007 100 23 This is how long it takes to load all the data from each table: stop.findAll = 148ms, stops.size: 15670 Hibernate: select coordinate0_.COORDINATEID as COORDINA1_2_, coordinate0_.LAT as LAT2_, coordinate0_.LNG as LNG2_ from COORDINATES coordinate0_ coord.findAll = 51ms , coordinates.size: 4704 Hibernate: select coordconne0_.COORDCONNECTIONID as COORDCON1_3_, coordconne0_.DISTANCE as DISTANCE3_, coordconne0_.FROMCOORDID as FROMCOOR3_3_, coordconne0_.TOCOORDID as TOCOORDID3_ from COORDCONNECTIONS coordconne0_ coordinateConnectionDao.findAll = 238ms ; coordConnectioninates.size:48132 Hibernate Annotations @Entity @Table(name = "STOPS") public class Stop implements Serializable { @Id @GeneratedValue(strategy = GenerationType.AUTO) @Column(name = "STOPID") private int stopID; @Column(name = "ROUTEID", nullable = false) private int routeID; @ManyToOne(fetch = FetchType.LAZY) @JoinColumn(name = "COORDINATEID", nullable = false) private Coordinate coordinate; } @Table(name = "COORDINATES") public class Coordinate { @Id @GeneratedValue @Column(name = "COORDINATEID") private int CoordinateID; @Column(name = "LAT") private double latitude; @Column(name = "LNG") private double longitude; } @Entity @Table(name = "COORDCONNECTIONS") public class CoordConnection { @Id @GeneratedValue @Column(name = "COORDCONNECTIONID") private int CoordinateID; @ManyToOne(fetch = FetchType.LAZY) @JoinColumn(name = "FROMCOORDID", nullable = false) private Coordinate fromCoordID; @ManyToOne(fetch = FetchType.LAZY) @JoinColumn(name = "TOCOORDID", nullable = false) private Coordinate toCoordID; @Column(name = "DISTANCE", nullable = false) private double distance; }

Read the article
Large flags enumerations in C#

- by LorenVS

Hey everyone, got a quick question that I can't seem to find anything about... I'm working on a project that requires flag enumerations with a large number of flags (up to 40-ish), and I don't really feel like typing in the exact mask for each enumeration value: public enum MyEnumeration : ulong { Flag1 = 1, Flag2 = 2, Flag3 = 4, Flag4 = 8, Flag5 = 16, // ... Flag16 = 65536, Flag17 = 65536 * 2, Flag18 = 65536 * 4, Flag19 = 65536 * 8, // ... Flag32 = 65536 * 65536, Flag33 = 65536 * 65536 * 2 // right about here I start to get really pissed off } Moreover, I'm also hoping that there is an easy(ier) way for me to control the actual arrangement of bits on different endian machines, since these values will eventually be serialized over a network: public enum MyEnumeration : uint { Flag1 = 1, // BIG: 0x00000001, LITTLE:0x01000000 Flag2 = 2, // BIG: 0x00000002, LITTLE:0x02000000 Flag3 = 4, // BIG: 0x00000004, LITTLE:0x03000000 // ... Flag9 = 256, // BIG: 0x00000010, LITTLE:0x10000000 Flag10 = 512, // BIG: 0x00000011, LITTLE:0x11000000 Flag11 = 1024 // BIG: 0x00000012, LITTLE:0x12000000 } So, I'm kind of wondering if there is some cool way I can set my enumerations up like: public enum MyEnumeration : uint { Flag1 = flag(1), // BOTH: 0x80000000 Flag2 = flag(2), // BOTH: 0x40000000 Flag3 = flag(3), // BOTH: 0x20000000 // ... Flag9 = flag(9), // BOTH: 0x00800000 } What I've Tried: // this won't work because Math.Pow returns double // and because C# requires constants for enum values public enum MyEnumeration : uint { Flag1 = Math.Pow(2, 0), Flag2 = Math.Pow(2, 1) } // this won't work because C# requires constants for enum values public enum MyEnumeration : uint { Flag1 = Masks.MyCustomerBitmaskGeneratingFunction(0) } // this is my best solution so far, but is definitely // quite clunkie public struct EnumWrapper<TEnum> where TEnum { private BitVector32 vector; public bool this[TEnum index] { // returns whether the index-th bit is set in vector } // all sorts of overriding using TEnum as args } Just wondering if anyone has any cool ideas, thanks!

Read the article
Good PHP / MYSQL hashing solution for large number of text values

- by Dave

Short descriptio: Need hashing algorithm solution in php for large number of text values. Long description. PRODUCT_OWNER_TABLE serial_number (auto_inc), product_name, owner_id OWNER_TABLE owner_id (auto_inc), owener_name I need to maintain a database of 200000 unique products and their owners (AND all subsequent changes to ownership). Each product has one owner, but an owner may have MANY different products. Owner names are "Adam Smith", "John Reeves", etc, just text values (quite likely to be unicode as well). I want to optimize the database design, so what i was thinking was, every week when i run this script, it fetchs the owner of a proudct, then checks against a table i suppose similar to PRODUCT_OWNER_TABLE, fetching the owner_id. It then looks up owner_id in OWNER_TABLE. If it matches, then its the same, so it moves on. The problem is when its different... To optimize the database, i think i should be checking against the other "owner_name" entries in OWNER_TABLE to see if that value exists there. If it does, then i should use that owner_id. If it doesnt, then i should add another entry. Note that there is nothing special about the "name". as long as i maintain the correct linkagaes AND make the OWNER_TABLE "read-only, append-new" type table - I should be able create a historical archive of ownership. I need to do this check for 200000 entries, with i dont know how many unique owner names (~50000?). I think i need a hashing solution - the OWNER_TABLE wont be sorted, so search algos wont be optimal. programming language is PHP. database is MYSQL.

Read the article
Why is Core Data not persisting these changes to disk?

- by scott

I added a new entity to my model and it loads fine but no changes made in memory get persisted to disk. My values set on the car object work fine in memory but aren't getting persisted to disk on hitting the home button (in simulator). I am using almost exactly the same code on another entity in my application and its values persist to disk fine (core data - sqlite3); Does anyone have a clue what I'm overlooking here? Car is the managed object, cars in an NSMutableArray of car objects and Car is the entity and Visible is the attribute on the entity which I am trying to set. Thanks for you assistance. Scott - (void)viewDidLoad { myAppDelegate* appDelegate = (myAppDelegate*)[[UIApplication sharedApplication] delegate]; NSManagedObjectContext* managedObjectContex = appDelegate.managedObjectContext; NSFetchRequest* request = [[NSFetchRequest alloc] init]; NSEntityDescription* entity = [NSEntityDescription entityForName:@"Car" inManagedObjectContext:managedObjectContex]; [request setEntity:entity]; NSSortDescriptor* sortDescriptor = [[NSSortDescriptor alloc] initWithKey:@"Name" ascending:YES]; NSArray* sortDescriptors = [[NSArray alloc] initWithObjects:sortDescriptor, nil]; [request setSortDescriptors:sortDescriptors]; [sortDescriptors release]; [sortDescriptor release]; NSError* error = nil; cars = [[managedObjectContex executeFetchRequest:request error:&error] mutableCopy]; if (cars == nil) { NSLog(@"Can't load the Cars data! Error: %@, %@", error, [error userInfo]); } [request release]; } - (void)tableView:(UITableView *)tableView didSelectRowAtIndexPath:(NSIndexPath*)indexPath { Car* car = [cars objectAtIndex:indexPath.row]; if (car.Visible == [NSNumber numberWithBool:YES]) { car.Visible = [NSNumber numberWithBool:NO]; [tableView cellForRowAtIndexPath:indexPath].accessoryType = UITableViewCellAccessoryNone; } else { car.Visible = [NSNumber numberWithBool:YES]; [tableView cellForRowAtIndexPath:indexPath].accessoryType = UITableViewCellAccessoryCheckmark; } }

Read the article
table design for storing large number of rows

- by hyperboreean

I am trying to store in a postgresql database some unique identifiers along with the site they have been seen on. I can't really decide which of the following 3 option to choose in order to be faster and easy maintainable. The table would have to provide the following information: the unique identifier which unfortunately it's text the sites on which that unique identifier has been seen The amount of data that would have to hold is rather large: there are around 22 millions unique identifiers that I know of. So I thought about the following designs of the table: id - integer identifier - text seen_on_site - an integer, foreign key to a sites table This approach would require around 22 mil multiplied by the number of sites. id - integer identifier - text seen_on_site_1 - boolean seen_on_site_2 - boolean ............ seen_on_site_n - boolean Hopefully the number of sites won't go past 10. This would require only the number of unique identifiers that I know of, that is around 20 millions, but it would make it hard to work with it from an ORM perspective. one table that would store only unique identifiers, like in: id - integer unique_identifier - text, one table that would store only sites, like in: id - integer site - text and one many to many relation, like: id - integer, unique_id - integer (fk to the table storing identifiers) site_id - integer (fk to sites table) another approach would be to have a table that stores unique identifiers for each site So, which one seems like a better approach to take on the long run?

Read the article
Write contents of custom View to large Image file on SD card

- by JFortney

I have a class that extends View. I override the onDraw method and allow the user to draw on the screen. I am at the point where I want to save this view as an image. I Can use buildDrawingCache and getDrawingCache to create a bitmap that I can write to the SD card. However, the image is not good quality at a large size, it has jagged edges. Since I have a View and I use Paths I can transform all by drawing to a bigger size. I just don't know how to make the Canvas bigger so when I call getDrawingCache it doesn't crop all the paths I am just transformed. What is happening is I transform all my paths but when I write the Bitmap to file I am only getting the "viewport" of the actual screen size. I want something much bigger. Any help in the right direction would be greatly appreciated. I have been reading the docs and books and am at a loss. Thanks Jon

Read the article
How do you make life easier for yourself when developing a really large database

- by Hannes de Jager

I am busy developing 2 web based systems with MySql databases and the amount of tables/views/stored routines is really becoming a lot and it is more and more challenging to handle the complexity. Now in programming languages we have namespacing e.g. Java packages, C++ namespaces to partition the software, grouping it together to make things more understandable. Databases on the other hand have more of a flat structure (MySql at least) e.g. tables and stored procedures are on the same level. So one have to be more creative, creating naming conventions, perhaps use more than one database or using tools to visualize things. What methods do you use to ease the pain? To be effective while developing your databases? To not get lost in a sea of tables and fields and stored procs? Feel free to mention tools you use also, but try to restrict it to open source and preferably Linux solutions if thats OK. b.t.w How many tables would a database have to be considered large in terms of design?

Read the article
Delphi fast large bitmap creation (without clearing)

- by Ritsaert Hornstra

When using the TBitmap wrapper for a GDI bitmap from the unit Graphics I noticed it will always clear out the bitmap (using a PatBlt call) when setting up a bitmap with SetSize( w, h ). When I copy in the bits later on (see routine below) it seems ScanLine is the fastest possibility and not SetDIBits. function ToBitmap: TBitmap; var i, N, x: Integer; S, D: PAnsiChar; begin Result := TBitmap.Create(); Result.PixelFormat := pf32bit; Result.SetSize( width, height ); S := Src; D := Result.ScanLine[ 0 ]; x := Integer( Result.ScanLine[ 1 ] ) - Integer( D ); N := width * sizeof( longword ); for i := 0 to height - 1 do begin Move( S^, D^, N ); Inc( S, N ); Inc( D, x ); end; end; The bitmaps I need to work with are quite large (150MB of RGB memory). With these iomages it takes 150ms to simply create an empty bitmap and a further 140ms to overwrite it's contents. Is there a way of initializing a TBitmap with the correct size WITHOUT initializing the pixels itself and leaving the memory of the pixels uninitialized (eg dirty)? Or is there another way to do such a thing. I know we could work on the pixels in place but this still leaves the 150ms of unnessesary initializtion of the pixels.

Read the article
How do I use Core Data with the Cocoa Text Input system?

- by the Joel

Hobbyist Cocoa programmer here. Have been looking around all the usual places, but this seems relatively under-explained: I am writing something a little out of the ordinary. It is much simpler than, but similar to, a desktop publishing app. I want editable text boxes on a canvas, arbitrarily placed. This is document-based and I’d really like to use Core Data. Now, The cocoa text-handling system seems to deal with a four-class structure: NSTextStorage, NSLayoutManager, NSTextContainer and finally NSTextView. I have looked into these and know how to use them, sort of. Have been making some prototypes and it works for simple apps. The problem arrives when I get into persistency. I don't know how to, by way of Cocoa Bindings or something else, store the contents of NSTextStorage (= the actual text) in my managed object context. I have considered overriding methods pairs like -words, -setWords: in these objects. This would let me link the words to a String, which I know how to store in Core Data. However, I’d have to override any method that affects the text - and that seems a little much. Thankful for any insights.

Read the article
Update graph in real time from server

- by user1869421

I'm trying to update a graph with received data, so that the height of the bars increase as more data is received from the server via a websocket. But my code doesn't render a graph in the browser and plot the data points. I cannot see anything wrong with the code. I really need some help here please. ws = new WebSocket("ws://localhost:8888/dh"); var useData = [] //var chart; var chart = d3.select("body") .append("svg:svg") .attr("class", "chart") .attr("width", 420) .attr("height", 200); ws.onmessage = function(evt) { var distances = JSON.parse(evt.data); data = distances.miles; console.log(data); if(useData.length <= 10){ useData.push(data) } else { var draw = function(data){ // Set the width relative to max data value var x = d3.scale.linear() .domain([0, d3.max(useData)]) .range([0, 420]); var y = d3.scale.ordinal() .domain(useData) .rangeBands([0, 120]); var rect = chart.selectAll("rect") .data(useData) // enter rect rect.enter().append("svg:rect") .attr("y", y) .attr("width", x) .attr("height", y.rangeBand()); // update rect rect .attr("y", y) .attr("width", x) .attr("height", y.rangeBand()); var text = chart.selectAll("text") .data(useData) // enter text text.enter().append("svg:text") .attr("x", x) .attr("y", function (d) { return y(d) + y.rangeBand() / 2; }) .attr("dx", -3) // padding-right .attr("dy", ".35em") // vertical-align: middle .attr("text-anchor", "end") // text-align: right .text(String); // update text text .data(useData) .attr("x", x) .text(String); } useData.length = 0; } } Thanks

Read the article
handle large Parcelable ArrayList in Android

- by Gal Ben-Haim

I'm developing an Android app that is a client to a JSON webservice API. I have classes of resource objects (some are nested) and I pass results from an IntentService that access the webserive using the Parcelable interface for all the resource classes. the webservice returns arrays or results that can be potentially large (because of the nesting, for example, a post object also contains comments array, each comment also contains a user object). currently I'm either inserting the results into a SQlite database or displaying them in a ListView. (my relevant methods are accepting ArrayList<resourceClass> as arguments). (some data need to be persistent stored and some should not). since I don't know what size of lists I can handle this way without reaching the memory limits, is this a good practice ? is it a better idea to save the parsed JSON to a local file immediately and pass the file path to the ResultReceiver, then either insert to database from that file or display the data ? is there a better way to handle this ? btw - I'm parsing the JSON as a stream with Gson's Reader so there shouldn't be memory issues at that stage.

Read the article
What database strategy to choose for a large web application

- by Snoopy

I have to rewrite a large database application, running on 32 servers. The hardware is up to date, each machine has two quad core Xeon and 32 GByte RAM. The database is multi-tenant, each customer has his own file, around 5 to 10 GByte each. I run around 50 databases on this hardware. The app is open to the web, so I have no control on the load. There are no really complex queries, so SQL is not required if there is a better solution. The databases get updated via FTP every day at midnight. The database is read-only. C# is my favourite language and I want to use ASP.NET MVC. I thought about the following options: Use two big SQL servers running SQL Server 2012 to serve the 32 servers with data. On the 32 servers running IIS hosting providing REST services. Denormalize the database and use Redis on each webserver. Use booksleeve as a Redis client. Use a combination of SQL Server and Redis Use SQL Server 2012 together with Hadoop Use Hadoop without SQL Server What is the best way for a read-only database, to get the best performance without loosing maintainability? Does Map-Reduce make sense at all in such a scenario? The reason for the rewrite is, the old app written in C++ with ISAM technology is too slow, the interfaces are old fashioned and not nice to use from an website, especially when using ajax. The app uses a relational datamodel with many tables, but it is possible to write one accerlerator table where all queries can be performed on, and all other information from the other tables are possible by a simple key lookup.

Read the article
How To Deal With Exceptions In Large Code Bases

- by peter

Hi All, I have a large C# code base. It seems quite buggy at times, and I was wondering if there is a quick way to improve finding and diagnosing issues that are occuring on client PCs. The most pressing issue is that exceptions occur in the software, are caught, and even reported through to me. The problem is that by the time they are caught the original cause of the exception is lost. I.e. If an exception was caught in a specific method, but that method calls 20 other methods, and those methods each call 20 other methods. You get the picture, a null reference exception is impossible to figure out, especially if it occured on a client machine. I have currently found some places where I think errors are more likely to occur and wrapped these directly in their own try catch blocks. Is that the only solution? I could be here a long time. I don't care that the exception will bring down the current process (it is just a thread anyway - not the main application), but I care that the exceptions come back and don't help with troubleshooting. Any ideas? I am well aware that I am probably asking a question which sounds silly, and may not have a straightforward answer. All the same some discussion would be good.

Read the article
JavaScript - Efficiently find all elements containing one of a large set of strings

- by noah

I have a set of strings and I need to find all all of the occurrences in an HTML document. Where the string occurs is important because I need to handle each case differently: String is all or part of an attribute. e.g., the string is foo: <input value="foo"> - Add class ATTR to the element. String is the full text of an element. e.g., <button>foo</button> - Add class TEXT to the element. String is inline in the text of an element. e.g., <p>I love foo</p> - Wrap the text in a span tag with class TEXT. Also, I need to match the longest string first. e.g., if I have foo and foobar, then <p>I love foobar</p> should become <p>I love <span class="TEXT">foobar</span></p>, not <p>I love <span class="TEXT">foo</span>bar</p>. The inline text is easy enough: Sort the strings descending by length and find and replace each in document.body.innerHTML with <span class="TEXT">$1</span>, although I'm not sure if that is the most efficient way to go. For the attributes, I can do something like this: sortedStrings.each(function(it) { document.body.innerHTML.replace(new RegExp('(\S+?)="[^"]*'+escapeRegExChars(it)+'[^"]*"','g'),function(s,attr) { $('[+attr+'*='+it+']').addClass('ATTR'); }); }); Again, that seems inefficient. Lastly, for the full text elements, a depth first search of the document that compares the innerHTML to each string will work, but for a large number of strings, it seems very inefficient. Any answer that offers performance improvements gets an upvote :)

Read the article
Most elegant way to break CSV columns into separate data structures using Python?

- by Nick L

I'm trying to pick up Python. As part of the learning process I'm porting a project I wrote in Java to Python. I'm at a section now where I have a list of CSV headers of the form: headers = [a, b, c, d, e, .....] and separate lists of groups that these headers should be broken up into, e.g.: headers_for_list_a = [b, c, e, ...] headers_for_list_b = [a, d, k, ...] . . . I want to take the CSV data and turn it into dict's based on these groups, e.g.: list_a = [ {b:val_1b, c:val_1c, e:val_1e, ... }, {b:val_2b, c:val_2c, e:val_2e, ... }, {b:val_3b, c:val_3c, e:val_3e, ... }, . . . ] where for example, val_1b is the first row of the 'b' column, val_3c is the third row of the 'c' column, etc. My first "Java instinct" is to do something like: for row in data: for col_num, val in enumerate(row): col_name = headers[col_num] if col_name in group_a: dict_a[col_name] = val elif headers[col_cum] in group_b: dict_b[col_name] = val ... list_a.append(dict_a) list_b.append(dict_b) ... However, this method seems inefficient/unwieldy and doesn't posses the elegance that Python programmers are constantly talking about. Is there a more "Zen-like" way I should try- keeping with the philosophy of Python?

Read the article
How can i fetch the large image from url

- by Kutbi

i used below code to fetch the image from url.but its not working for large image.. i missing something to add for that type of image to fetch. imgView = (ImageView)findViewById(R.id.ImageView01); imgView.setImageBitmap(loadBitmap("http://www.360technosoft.com/mx4.jpg")); //imgView.setImageBitmap(loadBitmap("http://sugardaddydiaries.com/wp-content/uploads/2010/12/how_do_i_get_sugar_daddy.jpg")); //setImageDrawable("http://sugardaddydiaries.com/wp-content/uploads/2010/12/holding-money-copy.jpg"); //Drawable drawable = LoadImageFromWebOperations("http://www.androidpeople.com/wp-content/uploads/2010/03/android.png"); //imgView.setImageDrawable(drawable); /* try { ImageView i = (ImageView)findViewById(R.id.ImageView01); Bitmap bitmap = BitmapFactory.decodeStream((InputStream)new URL("http://sugardaddydiaries.com/wp-content/uploads/2010/12/holding-money-copy.jpg").getContent()); i.setImageBitmap(bitmap); } catch (MalformedURLException e) { System.out.println("hello"); } catch (IOException e) { System.out.println("hello"); }*/ } protected Drawable ImageOperations(Context context, String string, String string2) { // TODO Auto-generated method stub try { InputStream is = (InputStream) this.fetch(string); Drawable d = Drawable.createFromStream(is, "src"); return d; } catch (MalformedURLException e) { e.printStackTrace(); return null; } catch (IOException e) { e.printStackTrace(); return null; } }

Read the article
"Thread was being aborted" 0n large dataset

- by Donaldinio

I am trying to process 114,000 rows in a dataset (populated from an oracle database). I am hitting an error at around the 600 mark - "Thread was being aborted". All I am doing is reading the dataset, and I still hit the issue. Is this too much data for a dataset? It seems to load into the dataset ok though. I welcome any better ways to process this amount of data. rootTermsTable = entKw.GetRootKeywordsByCategory(catID); for (int k = 0; k < rootTermsTable.Rows.Count; k++) { string keywordID = rootTermsTable.Rows[k]["IK_DBKEY"].ToString(); ... } public DataTable GetKeywordsByCategory(string categoryID) { DbProviderFactory provider = DbProviderFactories.GetFactory(connectionProvider); DbConnection con = provider.CreateConnection(); con.ConnectionString = connectionString; DbCommand com = provider.CreateCommand(); com.Connection = con; com.CommandText = string.Format("Select * From icm_keyword WHERE (IK_IC_DBKEY = {0})",categoryID); com.CommandType = CommandType.Text; DataSet ds = new DataSet(); DbDataAdapter ad = provider.CreateDataAdapter(); ad.SelectCommand = com; con.Open(); ad.Fill(ds); con.Close(); DataTable dt = new DataTable(); dt = ds.Tables[0]; return dt; //return ds.Tables[0].DefaultView; }

Read the article
Large ListView containing images in Android

- by Marco W.

For various Android applications, I need large ListViews, i.e. such views with 100-300 entries. All entries must be loaded in bulk when the application is started, as some sorting and processing is necessary and the application cannot know which items to display first, otherwise. So far, I've been loading the images for all items in bulk as well, which are then saved in an ArrayList<CustomType> together with the rest of the data for each entry. But of course, this is not a good practice, as you're very likely to have an OutOfMemoryException then: The references to all images in the ArrayList prevent the garbage collector from working. So the best solution is, obviously, to load only the text data in bulk whereas the images are then loaded as needed, right? The Google Play application does this, for example: You can see that images are loaded as you scroll to them, i.e. they are probably loaded in the adapter's getView() method. But with Google Play, this is a different problem, anyway, as the images must be loaded from the Internet, which is not the case for me. My problem is not that loading the images takes too long, but storing them requires too much memory. So what should I do with the images? Load in getView(), when they are really needed? Would make scrolling sluggish. So calling an AsyncTask then? Or just a normal Thread? Parametrize it? I could save the images that are already loaded into a HashMap<String,Bitmap>, so that they don't need to be loaded again in getView(). But if this is done, you have the memory problem again: The HashMap stores references to all images, so in the end, you could have the OutOfMemoryException again. I know that there are already lots of questions here that discuss "Lazy loading" of images. But they mainly cover the problem of slow loading, not too much memory consumption.

Read the article
Python Speeding Up Retrieving data from extremely large string

- by Burninghelix123

I have a list I converted to a very very long string as I am trying to edit it, as you can gather it's called tempString. It works as of now it just takes way to long to operate, probably because it is several different regex subs. They are as follow: tempString = ','.join(str(n) for n in coords) tempString = re.sub(',{2,6}', '_', tempString) tempString = re.sub("[^0-9\-\.\_]", ",", tempString) tempString = re.sub(',+', ',', tempString) clean1 = re.findall(('[-+]?[0-9]*\.?[0-9]+,[-+]?[0-9]*\.?[0-9]+,' '[-+]?[0-9]*\.?[0-9]+'), tempString) tempString = '_'.join(str(n) for n in clean1) tempString = re.sub(',', ' ', tempString) Basically it's a long string containing commas and about 1-5 million sets of 4 floats/ints (mixture of both possible),: -5.65500020981,6.88999986649,-0.454999923706,1,,,-5.65500020981,6.95499992371,-0.454999923706,1,,, The 4th number in each set I don't need/want, i'm essentially just trying to split the string into a list with 3 floats in each separated by a space. The above code works flawlessly but as you can imagine is quite time consuming on large strings. I have done a lot of research on here for a solution but they all seem geared towards words, i.e. swapping out one word for another. EDIT: Ok so this is the solution i'm currently using: def getValues(s): output = [] while s: # get the three values you want, discard the 3 commas, and the # remainder of the string v1, v2, v3, _, _, _, s = s.split(',', 6) output.append("%s %s %s" % (v1.strip(), v2.strip(), v3.strip())) return output coords = getValues(tempString) Anyone have any advice to speed this up even farther? After running some tests It still takes much longer than i'm hoping for. I've been glancing at numPy, but I honestly have absolutely no idea how to the above with it, I understand that after the above has been done and the values are cleaned up i could use them more efficiently with numPy, but not sure how NumPy could apply to the above. The above to clean through 50k sets takes around 20 minutes, I cant imagine how long it would be on my full string of 1 million sets. I'ts just surprising that the program that originally exported the data took only around 30 secs for the 1 million sets

Read the article
How to optimize paging for large in memory database

- by snakefoot

I have an application where the entire database is implemented in memory using a stl-map for each table in the database. Each item in the stl-map is a complex object with references to other items in the other stl-maps. The application works with a large amount of data, so it uses more than 500 MByte RAM. Clients are able to contact the application and get a filtered version of the entire database. This is done by running through the entire database, and finding items relevant for the client. When the application have been running for an hour or so, then Windows 2003 SP2 starts to page out parts of the RAM for the application (Eventhough there is 16 GByte RAM on the machine). After the application have been partly paged out then a client logon takes a long time (10 mins) because it now generates a page fault for each pointer lookup in the stl-map. I can see it is possible to tell Windows to lock memory in RAM, but this is generally only recommended for device drivers, and only for "small" amounts of memory. I guess a poor mans solution could be to loop through the entire memory database, and thus tell Windows we are still interested in keeping the datamodel in RAM. I guess another poor mans solution could be to disable the pagefile completely on Windows. I guess the expensive solution would be a SQL database, and then rewrite the entire application to use a database layer. Then hopefully the database system will have implemented means to for fast access. Are there other more elegant solutions ?

Read the article
download large files using servlet

- by niks

I am using Apache Tomcat Server 6 and Java 1.6 and am trying to write large mp3 files to the ServletOutputStream for a user to download. Files are ranging from a 50-750MB at the moment. The smaller files aren't causing too much of a problem but with the larger files it and getting socket exception broken pipe. File fileMp3 = new File(objDownloadSong.getStrSongFolder() + "/" + strSongIdName); FileInputStream fis = new FileInputStream(fileMp3); response.setContentType("audio/mpeg"); response.setHeader("Content-Disposition", "attachment; filename=\"" + strSongName + ".mp3\";"); response.setContentLength((int) fileMp3.length()); OutputStream os = response.getOutputStream(); try { int byteRead = 0; while ((byteRead = fis.read()) != -1) { os.write(byteRead); } os.flush(); } catch (Exception excp) { downloadComplete = "-1"; excp.printStackTrace(); } finally { os.close(); fis.close(); }

Read the article

< Previous Page | 138 139 140 141 142 143 144 145 146 147 148 149 | Next Page >