Search Results

Search found 11409 results on 457 pages for 'large teams'.

Page 34/457 | < Previous Page | 30 31 32 33 34 35 36 37 38 39 40 41  | Next Page >

  • Delete Duplicate records from large csv file C# .Net

    - by Sandhurst
    I have created a solution which read a large csv file currently 20-30 mb in size, I have tried to delete the duplicate rows based on certain column values that the user chooses at run time using the usual technique of finding duplicate rows but its so slow that it seems the program is not working at all. What other technique can be applied to remove duplicate records from a csv file Here's the code, definitely I am doing something wrong DataTable dtCSV = ReadCsv(file, columns); //columns is a list of string List column DataTable dt=RemoveDuplicateRecords(dtCSV, columns); private DataTable RemoveDuplicateRecords(DataTable dtCSV, List<string> columns) { DataView dv = dtCSV.DefaultView; string RowFilter=string.Empty; if(dt==null) dt = dv.ToTable().Clone(); DataRow row = dtCSV.Rows[0]; foreach (DataRow row in dtCSV.Rows) { try { RowFilter = string.Empty; foreach (string column in columns) { string col = column; RowFilter += "[" + col + "]" + "='" + row[col].ToString().Replace("'","''") + "' and "; } RowFilter = RowFilter.Substring(0, RowFilter.Length - 4); dv.RowFilter = RowFilter; DataRow dr = dt.NewRow(); bool result = RowExists(dt, RowFilter); if (!result) { dr.ItemArray = dv.ToTable().Rows[0].ItemArray; dt.Rows.Add(dr); } } catch (Exception ex) { } } return dt; }

    Read the article

  • Returning large collections from WCF Serivce

    - by Nate Bross
    I'm trying to determine the best approach for building a WCF Service, and the area I'm struggling with most is returning lists of objects. The built-in maxMessageSize of 64k seems pretty high, and I really don't want to bump it up (quick googling finds 100s of places bumping the maxMessageSize up to multi-gigabyte range which seems foolish). But, when I'm returning a collection of objects (~150 items) I am exceeding the default 64k. I'm almost to the point of returning my own class which inherits IEnumerable and has properties for hasNext, hasPrevious and PageSize so that I can implement paging on the client side -- this seems like alot of code. The other option is to jackup the maxMessageSize and hope for the best, but that feels wrong. All other aspects of my service are working great, its just returning large collectiosn where I'm having issues. For background, there are two types of consumers of this service, UI applications which will be primarly web and/or wpf applications, and data processing applications, .NET console apps, and maybe some other non-UI apps. For the UI applications, I would like to keep them responsive and keep the messageSize low, on the console apps it doesn't matter as much as they are just pulling data down to do processing and push it back up to the service.

    Read the article

  • Mysql Database Question about Large Columns

    - by murat
    Hi, I have a table that has 100.000 rows, and soon it will be doubled. The size of the database is currently 5 gb and most of them goes to one particular column, which is a text column for PDF files. We expect to have 20-30 GB or maybe 50 gb database after couple of month and this system will be used frequently. I have couple of questions regarding with this setup 1-) We are using innodb on every table, including users table etc. Is it better to use myisam on this table, where we store text version of the PDF files? (from memory usage /performance perspective) 2-) We use Sphinx for searching, however the data must be retrieved for highlighting. Highlighting is done via sphinx API but still we need to retrieve 10 rows in order to send it to Sphinx again. This 10 rows may allocate 50 mb memory, which is quite large. So I am planning to split these PDF files into chunks of 5 pages in the database, so these 100.000 rows will be around 3-4 million rows and couple of month later, instead of having 300.000-350.000 rows, we'll have 10 million rows to store text version of these PDF files. However, we will retrieve less pages, so again instead of retrieving 400 pages to send Sphinx for highlighting, we can retrieve 5 pages and it will have a big impact on the performance. Currently, when we search a term and retrieve PDF files that have more than 100 pages, the execution time is 0.3-0.35 seconds, however if we retrieve PDF files that have less than 5 pages, the execution time reduces to 0.06 seconds, and it also uses less memory. Do you think, this is a good trade-off? We will have million of rows instead of having 100k-200k rows but it will save memory and improve the performance. Is it a good approach to solve this problem and do you have any ideas how to overcome this problem? The text version of the data is used only for indexing and highlighting. So, we are very flexible. Thanks,

    Read the article

  • Large Y-axis tickInterval in high charts does not work

    - by ckovacs
    I have a chart at this JSFiddle to demonstrate a problem where our charts are not respecting the y-axis tick interval for large values: http://jsfiddle.net/z2cDu/1/ var plots = {"usBytePlots":[[1362009600000,143663192997],[1362096000000,110184848742],[1362182400000,97694974247],[1362268800000,90764690805],[1362355200000,112436517747],[1362441600000,113563368701],[1362528000000,139579327454],[1362614400000,118406594506],[1362700800000,125366899935],[1362787200000,134189435596],[1362873600000,132873135854],[1362960000000,121002328604],[1363046400000,123138222001],[1363132800000,115667785553],[1363219200000,103746172138],[1363305600000,108602633473],[1363392000000,89133998142],[1363478400000,92170701458],[1363564800000,86696922873],[1363651200000,80980159054],[1363737600000,97604615694],[1363824000000,108011666339],[1363910400000,124419138381],[1363996800000,121704988344],[1364083200000,124337959109],[1364169600000,137495512348],[1364256000000,136017103319],[1364342400000,60867510427]],"dsBytePlots":[[1362009600000,1734982247336],[1362096000000,1471928923201],[1362182400000,1453869593201],[1362268800000,1411787942581],[1362355200000,1460252447519],[1362441600000,1595590020177],[1362528000000,1658007074783],[1362614400000,1411941908699],[1362700800000,1447659369450],[1362787200000,1643008799861],[1362873600000,1792357973023],[1362960000000,1575173242169],[1363046400000,1565139003978],[1363132800000,1549211975554],[1363219200000,1438411448469],[1363305600000,1380445413578],[1363392000000,1298319283929],[1363478400000,1194578344720],[1363564800000,1211409679299],[1363651200000,1142416351471],[1363737600000,1223822672626],[1363824000000,1267692136487],[1363910400000,1384335759541],[1363996800000,1577205919828],[1364083200000,1675715948928],[1364169600000,1517593781592],[1364256000000,1562183018457],[1364342400000,681007264598]],"aggregatedTotalBytes":43476367948896,"aggregatedUsBytes":3150320403841,"aggregatedDsBytes":40326047545055,"maxTotalBytes":328186292129,"maxTotalBitsPerSecond":30387619.641574074} ; $('#container').highcharts({ yAxis: { tickInterval: 53687091200 // 500 gigabytes. Maximum y-axis value is approx 1.8TB }, series : [ { color: 'rgba(80, 180, 77, 0.7)', type: 'areaspline', name : 'Downstream', data : plots.dsBytePlots, total: plots.aggregatedDsBytes }, { color: 'rgba(33, 143, 197, 0.7)', type: 'areaspline', name : 'Upstream', data : plots.usBytePlots, total: plots.aggregatedUsBytes }] }); In this example we are charting bandwidth utilization in bytes. The chart has a maximum value of about 1.8TB. We set the y-axis tick interval to exactly 500GB but the rendered y-axis ticks don't make any sense for the given interval.

    Read the article

  • Need a tool to search large structure text documents for words, phrases and related phrases

    - by pitosalas
    I have to keep up with structured documents containing things such as requests for proposals, government program reports, threat models and all kinds of things like that. They are in techno-legalese as I would call them: highly structured, with section numbering and 3, 4 and 5 levels of nesting. All in English I need a more efficient way to locate those paragraphs of nuggets that matter to me. So what I’d like is kind of a local document index/repository, that would allow me to have some standing queries and easily locate sections in documents that talk about my queries. Here’s an example: I’d like to load in 10 large PDF files, each of say 100 pages. Each PDF contains English text, formatted very nicely into paragraphs and sections. I’d like to specify that I am interested in “blogging platforms”, “weaknesses in Ruby”, “localization and internationalization” Ideally then look at a list that showed the section of text, the name of the document, and other information that seemed to be related to and/or include the words and phrases I specified. I am sure something like this exists. I would call it something like document indexing, document comprehension or structured searching.

    Read the article

  • High Runtime for Dictionary.Add for a large amount of items

    - by aaginor
    Hi folks, I have a C#-Application that stores data from a TextFile in a Dictionary-Object. The amount of data to be stored can be rather large, so it takes a lot of time inserting the entries. With many items in the Dictionary it gets even worse, because of the resizing of internal array, that stores the data for the Dictionary. So I initialized the Dictionary with the amount of items that will be added, but this has no impact on speed. Here is my function: private Dictionary<IdPair, Edge> AddEdgesToExistingNodes(HashSet<NodeConnection> connections) { Dictionary<IdPair, Edge> resultSet = new Dictionary<IdPair, Edge>(connections.Count); foreach (NodeConnection con in connections) { ... resultSet.Add(nodeIdPair, newEdge); } return resultSet; } In my tests, I insert ~300k items. I checked the running time with ANTS Performance Profiler and found, that the Average time for resultSet.Add(...) doesn't change when I initialize the Dictionary with the needed size. It is the same as when I initialize the Dictionary with new Dictionary(); (about 0.256 ms on average for each Add). This is definitely caused by the amount of data in the Dictionary (ALTHOUGH I initialized it with the desired size). For the first 20k items, the average time for Add is 0.03 ms for each item. Any idea, how to make the add-operation faster? Thanks in advance, Frank

    Read the article

  • Large flags enumerations in C#

    - by LorenVS
    Hey everyone, got a quick question that I can't seem to find anything about... I'm working on a project that requires flag enumerations with a large number of flags (up to 40-ish), and I don't really feel like typing in the exact mask for each enumeration value: public enum MyEnumeration : ulong { Flag1 = 1, Flag2 = 2, Flag3 = 4, Flag4 = 8, Flag5 = 16, // ... Flag16 = 65536, Flag17 = 65536 * 2, Flag18 = 65536 * 4, Flag19 = 65536 * 8, // ... Flag32 = 65536 * 65536, Flag33 = 65536 * 65536 * 2 // right about here I start to get really pissed off } Moreover, I'm also hoping that there is an easy(ier) way for me to control the actual arrangement of bits on different endian machines, since these values will eventually be serialized over a network: public enum MyEnumeration : uint { Flag1 = 1, // BIG: 0x00000001, LITTLE:0x01000000 Flag2 = 2, // BIG: 0x00000002, LITTLE:0x02000000 Flag3 = 4, // BIG: 0x00000004, LITTLE:0x03000000 // ... Flag9 = 256, // BIG: 0x00000010, LITTLE:0x10000000 Flag10 = 512, // BIG: 0x00000011, LITTLE:0x11000000 Flag11 = 1024 // BIG: 0x00000012, LITTLE:0x12000000 } So, I'm kind of wondering if there is some cool way I can set my enumerations up like: public enum MyEnumeration : uint { Flag1 = flag(1), // BOTH: 0x80000000 Flag2 = flag(2), // BOTH: 0x40000000 Flag3 = flag(3), // BOTH: 0x20000000 // ... Flag9 = flag(9), // BOTH: 0x00800000 } What I've Tried: // this won't work because Math.Pow returns double // and because C# requires constants for enum values public enum MyEnumeration : uint { Flag1 = Math.Pow(2, 0), Flag2 = Math.Pow(2, 1) } // this won't work because C# requires constants for enum values public enum MyEnumeration : uint { Flag1 = Masks.MyCustomerBitmaskGeneratingFunction(0) } // this is my best solution so far, but is definitely // quite clunkie public struct EnumWrapper<TEnum> where TEnum { private BitVector32 vector; public bool this[TEnum index] { // returns whether the index-th bit is set in vector } // all sorts of overriding using TEnum as args } Just wondering if anyone has any cool ideas, thanks!

    Read the article

  • Hibernate design to speed up querying of large dataset

    - by paddydub
    I currently have the below tables representing a bus network mapped in hibernate, accessed from a Spring MVC based bus route planner I'm trying to make my route planner application perform faster, I load all the above tables into Lists to perform the route planner logic. I would appreciate if anyone has any ideas of how to speed my performace Or any suggestions of another method to approach this problem of handling a large set of data Coordinate Connections Table (INT,INT,INT, DOUBLE)( Containing 50,000 Coordinate Connections) ID, FROMCOORDID, TOCOORDID, DISTANCE 1 1 2 0.383657 2 1 17 0.173201 3 1 63 0.258781 4 1 64 0.013726 5 1 65 0.459829 6 1 95 0.458769 Coordinate Table (INT,DECIMAL, DECIMAL) (Containing 4700 Coordinates) ID , LAT, LNG 0 59.352669 -7.264341 1 59.352669 -7.264341 2 59.350012 -7.260653 3 59.337585 -7.189798 4 59.339221 -7.193582 5 59.341408 -7.205888 Bus Stop Table (INT, INT, INT)(Containing 15000 Stops) StopID RouteID COORDINATEID 1000100001 100 17 1000100002 100 18 1000100003 100 19 1000100004 100 20 1000100005 100 21 1000100006 100 22 1000100007 100 23 This is how long it takes to load all the data from each table: stop.findAll = 148ms, stops.size: 15670 Hibernate: select coordinate0_.COORDINATEID as COORDINA1_2_, coordinate0_.LAT as LAT2_, coordinate0_.LNG as LNG2_ from COORDINATES coordinate0_ coord.findAll = 51ms , coordinates.size: 4704 Hibernate: select coordconne0_.COORDCONNECTIONID as COORDCON1_3_, coordconne0_.DISTANCE as DISTANCE3_, coordconne0_.FROMCOORDID as FROMCOOR3_3_, coordconne0_.TOCOORDID as TOCOORDID3_ from COORDCONNECTIONS coordconne0_ coordinateConnectionDao.findAll = 238ms ; coordConnectioninates.size:48132 Hibernate Annotations @Entity @Table(name = "STOPS") public class Stop implements Serializable { @Id @GeneratedValue(strategy = GenerationType.AUTO) @Column(name = "STOPID") private int stopID; @Column(name = "ROUTEID", nullable = false) private int routeID; @ManyToOne(fetch = FetchType.LAZY) @JoinColumn(name = "COORDINATEID", nullable = false) private Coordinate coordinate; } @Table(name = "COORDINATES") public class Coordinate { @Id @GeneratedValue @Column(name = "COORDINATEID") private int CoordinateID; @Column(name = "LAT") private double latitude; @Column(name = "LNG") private double longitude; } @Entity @Table(name = "COORDCONNECTIONS") public class CoordConnection { @Id @GeneratedValue @Column(name = "COORDCONNECTIONID") private int CoordinateID; @ManyToOne(fetch = FetchType.LAZY) @JoinColumn(name = "FROMCOORDID", nullable = false) private Coordinate fromCoordID; @ManyToOne(fetch = FetchType.LAZY) @JoinColumn(name = "TOCOORDID", nullable = false) private Coordinate toCoordID; @Column(name = "DISTANCE", nullable = false) private double distance; }

    Read the article

  • Good PHP / MYSQL hashing solution for large number of text values

    - by Dave
    Short descriptio: Need hashing algorithm solution in php for large number of text values. Long description. PRODUCT_OWNER_TABLE serial_number (auto_inc), product_name, owner_id OWNER_TABLE owner_id (auto_inc), owener_name I need to maintain a database of 200000 unique products and their owners (AND all subsequent changes to ownership). Each product has one owner, but an owner may have MANY different products. Owner names are "Adam Smith", "John Reeves", etc, just text values (quite likely to be unicode as well). I want to optimize the database design, so what i was thinking was, every week when i run this script, it fetchs the owner of a proudct, then checks against a table i suppose similar to PRODUCT_OWNER_TABLE, fetching the owner_id. It then looks up owner_id in OWNER_TABLE. If it matches, then its the same, so it moves on. The problem is when its different... To optimize the database, i think i should be checking against the other "owner_name" entries in OWNER_TABLE to see if that value exists there. If it does, then i should use that owner_id. If it doesnt, then i should add another entry. Note that there is nothing special about the "name". as long as i maintain the correct linkagaes AND make the OWNER_TABLE "read-only, append-new" type table - I should be able create a historical archive of ownership. I need to do this check for 200000 entries, with i dont know how many unique owner names (~50000?). I think i need a hashing solution - the OWNER_TABLE wont be sorted, so search algos wont be optimal. programming language is PHP. database is MYSQL.

    Read the article

  • Filter large amounts of data in a table w/ jQuery

    - by Bry4n
    I work for a transit agency and I have large amounts of data (mostly times), and I need a way to filter the data using two textboxes (To and From). I found jQuery quick search, but it seems to only work with one textbox. If anyone has any ideas via jQuery or some other client side library, that would be fantastic. Ideal example: To: [Textbox] From:[Textbox] <table> <tr> <td>69th street</td><td>5:00pm</td><td>5:06pm</td><td>5:10pm</td><td>5:20pm</td> </tr> <tr> <td>Millbourne</td><td>5:09pm</td><td>5:15pm</td><td>5:20pm</td><td>5:25pm</td> </tr> <tr> <td>Spring Garden</td><td>6:00pm</td><td>6:15pm</td><td>6:20pm</td><td>6:25pm</td> </tr> </table> So If I start typing in one of the stations in the To: textbox it either displays dynamically like the quick search or i have to press a button (either or) and then in the from: textbox. Lastly it shows me to: station and all its times on the left and the from: station and all its times on the right.

    Read the article

  • table design for storing large number of rows

    - by hyperboreean
    I am trying to store in a postgresql database some unique identifiers along with the site they have been seen on. I can't really decide which of the following 3 option to choose in order to be faster and easy maintainable. The table would have to provide the following information: the unique identifier which unfortunately it's text the sites on which that unique identifier has been seen The amount of data that would have to hold is rather large: there are around 22 millions unique identifiers that I know of. So I thought about the following designs of the table: id - integer identifier - text seen_on_site - an integer, foreign key to a sites table This approach would require around 22 mil multiplied by the number of sites. id - integer identifier - text seen_on_site_1 - boolean seen_on_site_2 - boolean ............ seen_on_site_n - boolean Hopefully the number of sites won't go past 10. This would require only the number of unique identifiers that I know of, that is around 20 millions, but it would make it hard to work with it from an ORM perspective. one table that would store only unique identifiers, like in: id - integer unique_identifier - text, one table that would store only sites, like in: id - integer site - text and one many to many relation, like: id - integer, unique_id - integer (fk to the table storing identifiers) site_id - integer (fk to sites table) another approach would be to have a table that stores unique identifiers for each site So, which one seems like a better approach to take on the long run?

    Read the article

  • Filter large amounts of data from a HTML table w/ jQuery

    - by Bry4n
    I work for a transit agency and I have large amounts of data (mostly times), and I need a way to filter the data using two textboxes (To and From). I found jQuery quick search, but it seems to only work with one textbox. If anyone has any ideas via jQuery or some other client side library, that would be fantastic. Ideal example: To: [Textbox] From:[Textbox] <table> <tr> <td>69th street</td><td>5:00pm</td><td>5:06pm</td><td>5:10pm</td><td>5:20pm</td> </tr> <tr> <td>Millbourne</td><td>5:09pm</td><td>5:15pm</td><td>5:20pm</td><td>5:25pm</td> </tr> <tr> <td>Spring Garden</td><td>6:00pm</td><td>6:15pm</td><td>6:20pm</td><td>6:25pm</td> </tr> </table> I have an HTML page with a giant table on it listing the station names and each stations times. I want to be able to put my starting location in one box and my ending location in another box and have all the items in the table disappear that don't relate to either of the two locations typed in, leaving only two rows that match what was typed in (even if they don't spell it right or type it all the way) Similar to the jQuery quick search plugin

    Read the article

  • Write contents of custom View to large Image file on SD card

    - by JFortney
    I have a class that extends View. I override the onDraw method and allow the user to draw on the screen. I am at the point where I want to save this view as an image. I Can use buildDrawingCache and getDrawingCache to create a bitmap that I can write to the SD card. However, the image is not good quality at a large size, it has jagged edges. Since I have a View and I use Paths I can transform all by drawing to a bigger size. I just don't know how to make the Canvas bigger so when I call getDrawingCache it doesn't crop all the paths I am just transformed. What is happening is I transform all my paths but when I write the Bitmap to file I am only getting the "viewport" of the actual screen size. I want something much bigger. Any help in the right direction would be greatly appreciated. I have been reading the docs and books and am at a loss. Thanks Jon

    Read the article

  • How do you make life easier for yourself when developing a really large database

    - by Hannes de Jager
    I am busy developing 2 web based systems with MySql databases and the amount of tables/views/stored routines is really becoming a lot and it is more and more challenging to handle the complexity. Now in programming languages we have namespacing e.g. Java packages, C++ namespaces to partition the software, grouping it together to make things more understandable. Databases on the other hand have more of a flat structure (MySql at least) e.g. tables and stored procedures are on the same level. So one have to be more creative, creating naming conventions, perhaps use more than one database or using tools to visualize things. What methods do you use to ease the pain? To be effective while developing your databases? To not get lost in a sea of tables and fields and stored procs? Feel free to mention tools you use also, but try to restrict it to open source and preferably Linux solutions if thats OK. b.t.w How many tables would a database have to be considered large in terms of design?

    Read the article

  • Delphi fast large bitmap creation (without clearing)

    - by Ritsaert Hornstra
    When using the TBitmap wrapper for a GDI bitmap from the unit Graphics I noticed it will always clear out the bitmap (using a PatBlt call) when setting up a bitmap with SetSize( w, h ). When I copy in the bits later on (see routine below) it seems ScanLine is the fastest possibility and not SetDIBits. function ToBitmap: TBitmap; var i, N, x: Integer; S, D: PAnsiChar; begin Result := TBitmap.Create(); Result.PixelFormat := pf32bit; Result.SetSize( width, height ); S := Src; D := Result.ScanLine[ 0 ]; x := Integer( Result.ScanLine[ 1 ] ) - Integer( D ); N := width * sizeof( longword ); for i := 0 to height - 1 do begin Move( S^, D^, N ); Inc( S, N ); Inc( D, x ); end; end; The bitmaps I need to work with are quite large (150MB of RGB memory). With these iomages it takes 150ms to simply create an empty bitmap and a further 140ms to overwrite it's contents. Is there a way of initializing a TBitmap with the correct size WITHOUT initializing the pixels itself and leaving the memory of the pixels uninitialized (eg dirty)? Or is there another way to do such a thing. I know we could work on the pixels in place but this still leaves the 150ms of unnessesary initializtion of the pixels.

    Read the article

  • handle large Parcelable ArrayList in Android

    - by Gal Ben-Haim
    I'm developing an Android app that is a client to a JSON webservice API. I have classes of resource objects (some are nested) and I pass results from an IntentService that access the webserive using the Parcelable interface for all the resource classes. the webservice returns arrays or results that can be potentially large (because of the nesting, for example, a post object also contains comments array, each comment also contains a user object). currently I'm either inserting the results into a SQlite database or displaying them in a ListView. (my relevant methods are accepting ArrayList<resourceClass> as arguments). (some data need to be persistent stored and some should not). since I don't know what size of lists I can handle this way without reaching the memory limits, is this a good practice ? is it a better idea to save the parsed JSON to a local file immediately and pass the file path to the ResultReceiver, then either insert to database from that file or display the data ? is there a better way to handle this ? btw - I'm parsing the JSON as a stream with Gson's Reader so there shouldn't be memory issues at that stage.

    Read the article

  • Way to store a large dictionary with low memory footprint + fast lookups (on Android)

    - by BobbyJim
    I'm developing an android word game app that needs a large (~250,000 word dictionary) available. I need: reasonably fast look ups e.g. constant time preferable, need to do maybe 200 lookups a second on occasion to solve a word puzzle and maybe 20 lookups within 0.2 second more often to check words the user just spelled. EDIT: Lookups are typically asking "Is in the dictionary?". I'd like to support up to two wildcards in the word as well, but this is easy enough by just generating all possible letters the wildcards could have been and checking the generated words (i.e. 26 * 26 lookups for a word with two wildcards). as it's a mobile app, using as little memory as possible and requiring only a small initial download for the dictionary data is top priority. My first naive attempts used Java's HashMap class, which caused an out of memory exception. I've looked into using the SQL lite databases available on android, but this seems like overkill. What's a good way to do what I need?

    Read the article

  • JavaScript - Efficiently find all elements containing one of a large set of strings

    - by noah
    I have a set of strings and I need to find all all of the occurrences in an HTML document. Where the string occurs is important because I need to handle each case differently: String is all or part of an attribute. e.g., the string is foo: <input value="foo"> - Add class ATTR to the element. String is the full text of an element. e.g., <button>foo</button> - Add class TEXT to the element. String is inline in the text of an element. e.g., <p>I love foo</p> - Wrap the text in a span tag with class TEXT. Also, I need to match the longest string first. e.g., if I have foo and foobar, then <p>I love foobar</p> should become <p>I love <span class="TEXT">foobar</span></p>, not <p>I love <span class="TEXT">foo</span>bar</p>. The inline text is easy enough: Sort the strings descending by length and find and replace each in document.body.innerHTML with <span class="TEXT">$1</span>, although I'm not sure if that is the most efficient way to go. For the attributes, I can do something like this: sortedStrings.each(function(it) { document.body.innerHTML.replace(new RegExp('(\S+?)="[^"]*'+escapeRegExChars(it)+'[^"]*"','g'),function(s,attr) { $('[+attr+'*='+it+']').addClass('ATTR'); }); }); Again, that seems inefficient. Lastly, for the full text elements, a depth first search of the document that compares the innerHTML to each string will work, but for a large number of strings, it seems very inefficient. Any answer that offers performance improvements gets an upvote :)

    Read the article

  • How To Deal With Exceptions In Large Code Bases

    - by peter
    Hi All, I have a large C# code base. It seems quite buggy at times, and I was wondering if there is a quick way to improve finding and diagnosing issues that are occuring on client PCs. The most pressing issue is that exceptions occur in the software, are caught, and even reported through to me. The problem is that by the time they are caught the original cause of the exception is lost. I.e. If an exception was caught in a specific method, but that method calls 20 other methods, and those methods each call 20 other methods. You get the picture, a null reference exception is impossible to figure out, especially if it occured on a client machine. I have currently found some places where I think errors are more likely to occur and wrapped these directly in their own try catch blocks. Is that the only solution? I could be here a long time. I don't care that the exception will bring down the current process (it is just a thread anyway - not the main application), but I care that the exceptions come back and don't help with troubleshooting. Any ideas? I am well aware that I am probably asking a question which sounds silly, and may not have a straightforward answer. All the same some discussion would be good.

    Read the article

  • What database strategy to choose for a large web application

    - by Snoopy
    I have to rewrite a large database application, running on 32 servers. The hardware is up to date, each machine has two quad core Xeon and 32 GByte RAM. The database is multi-tenant, each customer has his own file, around 5 to 10 GByte each. I run around 50 databases on this hardware. The app is open to the web, so I have no control on the load. There are no really complex queries, so SQL is not required if there is a better solution. The databases get updated via FTP every day at midnight. The database is read-only. C# is my favourite language and I want to use ASP.NET MVC. I thought about the following options: Use two big SQL servers running SQL Server 2012 to serve the 32 servers with data. On the 32 servers running IIS hosting providing REST services. Denormalize the database and use Redis on each webserver. Use booksleeve as a Redis client. Use a combination of SQL Server and Redis Use SQL Server 2012 together with Hadoop Use Hadoop without SQL Server What is the best way for a read-only database, to get the best performance without loosing maintainability? Does Map-Reduce make sense at all in such a scenario? The reason for the rewrite is, the old app written in C++ with ISAM technology is too slow, the interfaces are old fashioned and not nice to use from an website, especially when using ajax. The app uses a relational datamodel with many tables, but it is possible to write one accerlerator table where all queries can be performed on, and all other information from the other tables are possible by a simple key lookup.

    Read the article

  • How can i fetch the large image from url

    - by Kutbi
    i used below code to fetch the image from url.but its not working for large image.. i missing something to add for that type of image to fetch. imgView = (ImageView)findViewById(R.id.ImageView01); imgView.setImageBitmap(loadBitmap("http://www.360technosoft.com/mx4.jpg")); //imgView.setImageBitmap(loadBitmap("http://sugardaddydiaries.com/wp-content/uploads/2010/12/how_do_i_get_sugar_daddy.jpg")); //setImageDrawable("http://sugardaddydiaries.com/wp-content/uploads/2010/12/holding-money-copy.jpg"); //Drawable drawable = LoadImageFromWebOperations("http://www.androidpeople.com/wp-content/uploads/2010/03/android.png"); //imgView.setImageDrawable(drawable); /* try { ImageView i = (ImageView)findViewById(R.id.ImageView01); Bitmap bitmap = BitmapFactory.decodeStream((InputStream)new URL("http://sugardaddydiaries.com/wp-content/uploads/2010/12/holding-money-copy.jpg").getContent()); i.setImageBitmap(bitmap); } catch (MalformedURLException e) { System.out.println("hello"); } catch (IOException e) { System.out.println("hello"); }*/ } protected Drawable ImageOperations(Context context, String string, String string2) { // TODO Auto-generated method stub try { InputStream is = (InputStream) this.fetch(string); Drawable d = Drawable.createFromStream(is, "src"); return d; } catch (MalformedURLException e) { e.printStackTrace(); return null; } catch (IOException e) { e.printStackTrace(); return null; } }

    Read the article

  • Fastest way to perform subset test operation on a large collection of sets with same domain

    - by niktech
    Assume we have trillions of sets stored somewhere. The domain for each of these sets is the same. It is also finite and discrete. So each set may be stored as a bit field (eg: 0000100111...) of a relatively short length (eg: 1024). That is, bit X in the bitfield indicates whether item X (of 1024 possible items) is included in the given set or not. Now, I want to devise a storage structure and an algorithm to efficiently answer the query: what sets in the data store have set Y as a subset. Set Y itself is not present in the data store and is specified at run time. Now the simplest way to solve this would be to AND the bitfield for set Y with bit fields of every set in the data store one by one, picking the ones whose AND result matches Y's bitfield. How can I speed this up? Is there a tree structure (index) or some smart algorithm that would allow me to perform this query without having to AND every stored set's bitfield? Are there databases that already support such operations on large collections of sets?

    Read the article

  • Statistical analysis on large data set to be published on the web

    - by dassouki
    I have a non-computer related data logger, that collects data from the field. This data is stored as text files, and I manually lump the files together and organize them. The current format is through a csv file per year per logger. Each file is around 4,000,000 lines x 7 loggers x 5 years = a lot of data. some of the data is organized as bins item_type, item_class, item_dimension_class, and other data is more unique, such as item_weight, item_color, date_collected, and so on ... Currently, I do statistical analysis on the data using a python/numpy/matplotlib program I wrote. It works fine, but the problem is, I'm the only one who can use it, since it and the data live on my computer. I'd like to publish the data on the web using a postgres db; however, I need to find or implement a statistical tool that'll take a large postgres table, and return statistical results within an adequate time frame. I'm not familiar with python for the web; however, I'm proficient with PHP on the web side, and python on the offline side. users should be allowed to create their own histograms, data analysis. For example, a user can search for all items that are blue shipped between week x and week y, while another user can search for sort the weight distribution of all items by hour for all year long. I was thinking of creating and indexing my own statistical tools, or automate the process somehow to emulate most queries. This seemed inefficient. I'm looking forward to hearing your ideas Thanks

    Read the article

  • Large ListView containing images in Android

    - by Marco W.
    For various Android applications, I need large ListViews, i.e. such views with 100-300 entries. All entries must be loaded in bulk when the application is started, as some sorting and processing is necessary and the application cannot know which items to display first, otherwise. So far, I've been loading the images for all items in bulk as well, which are then saved in an ArrayList<CustomType> together with the rest of the data for each entry. But of course, this is not a good practice, as you're very likely to have an OutOfMemoryException then: The references to all images in the ArrayList prevent the garbage collector from working. So the best solution is, obviously, to load only the text data in bulk whereas the images are then loaded as needed, right? The Google Play application does this, for example: You can see that images are loaded as you scroll to them, i.e. they are probably loaded in the adapter's getView() method. But with Google Play, this is a different problem, anyway, as the images must be loaded from the Internet, which is not the case for me. My problem is not that loading the images takes too long, but storing them requires too much memory. So what should I do with the images? Load in getView(), when they are really needed? Would make scrolling sluggish. So calling an AsyncTask then? Or just a normal Thread? Parametrize it? I could save the images that are already loaded into a HashMap<String,Bitmap>, so that they don't need to be loaded again in getView(). But if this is done, you have the memory problem again: The HashMap stores references to all images, so in the end, you could have the OutOfMemoryException again. I know that there are already lots of questions here that discuss "Lazy loading" of images. But they mainly cover the problem of slow loading, not too much memory consumption.

    Read the article

  • Python Speeding Up Retrieving data from extremely large string

    - by Burninghelix123
    I have a list I converted to a very very long string as I am trying to edit it, as you can gather it's called tempString. It works as of now it just takes way to long to operate, probably because it is several different regex subs. They are as follow: tempString = ','.join(str(n) for n in coords) tempString = re.sub(',{2,6}', '_', tempString) tempString = re.sub("[^0-9\-\.\_]", ",", tempString) tempString = re.sub(',+', ',', tempString) clean1 = re.findall(('[-+]?[0-9]*\.?[0-9]+,[-+]?[0-9]*\.?[0-9]+,' '[-+]?[0-9]*\.?[0-9]+'), tempString) tempString = '_'.join(str(n) for n in clean1) tempString = re.sub(',', ' ', tempString) Basically it's a long string containing commas and about 1-5 million sets of 4 floats/ints (mixture of both possible),: -5.65500020981,6.88999986649,-0.454999923706,1,,,-5.65500020981,6.95499992371,-0.454999923706,1,,, The 4th number in each set I don't need/want, i'm essentially just trying to split the string into a list with 3 floats in each separated by a space. The above code works flawlessly but as you can imagine is quite time consuming on large strings. I have done a lot of research on here for a solution but they all seem geared towards words, i.e. swapping out one word for another. EDIT: Ok so this is the solution i'm currently using: def getValues(s): output = [] while s: # get the three values you want, discard the 3 commas, and the # remainder of the string v1, v2, v3, _, _, _, s = s.split(',', 6) output.append("%s %s %s" % (v1.strip(), v2.strip(), v3.strip())) return output coords = getValues(tempString) Anyone have any advice to speed this up even farther? After running some tests It still takes much longer than i'm hoping for. I've been glancing at numPy, but I honestly have absolutely no idea how to the above with it, I understand that after the above has been done and the values are cleaned up i could use them more efficiently with numPy, but not sure how NumPy could apply to the above. The above to clean through 50k sets takes around 20 minutes, I cant imagine how long it would be on my full string of 1 million sets. I'ts just surprising that the program that originally exported the data took only around 30 secs for the 1 million sets

    Read the article

< Previous Page | 30 31 32 33 34 35 36 37 38 39 40 41  | Next Page >