Search Results

Search found 33242 results on 1330 pages for 'database optimization'.

Page 223/1330 | < Previous Page | 219 220 221 222 223 224 225 226 227 228 229 230  | Next Page >

  • "volatile" qualifier and compiler reorderings

    - by Checkers
    A compiler cannot eliminate or reorder reads/writes to a volatile-qualified variables. But what about the cases where other variables are present, which may or may not be volatile-qualified? Scenario 1 volatile int a; volatile int b; a = 1; b = 2; a = 3; b = 4; Can the compiler reorder first and the second, or third and the fourth assignments? Scenario 2 volatile int a; int b, c; b = 1; a = 1; c = b; a = 3; Same question, can the compiler reorder first and the second, or third and the fourth assignments?

    Read the article

  • PHP Flush: How Often and Best Practises

    - by Cory Dee
    I just finished reading this post: http://developer.yahoo.com/performance/rules.html#flush and have already implemented a flush after the top portion of my page loads (head, css, top banner/search/nav). Is there any performance hit in flushing? Is there such a thing as doing it too often? What are the best practices? If I am going to hit an external API for data, would it make sense to flush before hand so that the user isn't waiting on that data to come back, and can at least get some data before hand? Thanks to everyone in advance.

    Read the article

  • Fastest way to remove non-numeric characters from a VARCHAR in SQL Server

    - by Dan Herbert
    I'm writing an import utility that is using phone numbers as a unique key within the import. I need to check that the phone number does not already exist in my DB. The problem is that phone numbers in the DB could have things like dashes and parenthesis and possibly other things. I wrote a function to remove these things, the problem is that it is slow and with thousands of records in my DB and thousands of records to import at once, this process can be unacceptably slow. I've already made the phone number column an index. I tried using the script from this post: http://stackoverflow.com/questions/52315/t-sql-trim-nbsp-and-other-non-alphanumeric-characters But that didn't speed it up any. Is there a faster way to remove non-numeric characters? Something that can perform well when 10,000 to 100,000 records have to be compared. Whatever is done needs to perform fast. Update Given what people responded with, I think I'm going to have to clean the fields before I run the import utility. To answer the question of what I'm writing the import utility in, it is a C# app. I'm comparing BIGINT to BIGINT now, with no need to alter DB data and I'm still taking a performance hit with a very small set of data (about 2000 records). Could comparing BIGINT to BIGINT be slowing things down? I've optimized the code side of my app as much as I can (removed regexes, removed unneccessary DB calls). Although I can't isolate SQL as the source of the problem anymore, I still feel like it is.

    Read the article

  • Tips on creating user interfaces and optimizing the user experience

    - by Saif Bechan
    I am currently working on a project where a lot of user interaction is going to take place. There is also a commercial side as people can buy certain items and services. In my opinion a good blend of user interface, speed and security is essential for these types of websites. It is fairly easy to use ajax and JavaScript nowadays to do almost everything, as there are a lot of libraries available such as jQuery and others. But this can have some performance and incompatibility issues. This can lead to users just going to the next website. The overall look of the website is important too. Where to place certain buttons, where to place certain types of articles such as faq and support. Where and how to display error messages so that the user sees them but are not bothering him. And an overall color scheme is important too. The basic question is: How to create an interface that triggers a user to buy/use your services I know psychology also plays a huge role in how users interact with your website. The color scheme for example is important. When the colors are irritating on a website you just want to click away. I have not found any articles that explain those concept. Does anyone have any tips and/or recourses where i can get some articles that guide you in making the correct choices for your website.

    Read the article

  • ILOG CPLEX: how to populate IloLPMatrix while using addGe to set up the model?

    - by downer
    I have a queatoin about IloLPMatrix and addGe. I was trying to follow the example of AdMIPex5.java to generate user defined cutting planes based on the solution to the LP relaxation. The difference is that eh initial MIP model is not read in from a mps file, but set up in the code using methods like addGe, addLe etc. I think this is why I ran into problems while copying the exampe to do the following. IloLPMatrix lp = (IloLPMatrix)cplex.LPMatrixIterator().next(); lp from the above line turns to be NULL. I am wondering 1. What is the relationship between IloLPMatrix and the addLe, addGe commands? I tried to addLPMatrix() to the model, and then used model.addGe methods. but the LPMatrix seems to be empty still. How do I populate the IloLPMatrix of the moel according to the value that I had set up using addGe and addLe. Is the a method to this easily, or do I have to set them up row by row myself? I was doing this to get the number of variables and their values by doing lp.getNumVars(). Is there other methods that I can use to get the number of variables and their values wihout doing these, since my system is set up by addLe, addGe etc? Thanks a lot for your help on this.

    Read the article

  • vectorizing a for loop in numpy/scipy?

    - by user248237
    I'm trying to vectorize a for loop that I have inside of a class method. The for loop has the following form: it iterates through a bunch of points and depending on whether a certain variable (called "self.condition_met" below) is true, calls a pair of functions on the point, and adds the result to a list. Each point here is an element in a vector of lists, i.e. a data structure that looks like array([[1,2,3], [4,5,6], ...]). Here is the problematic function: def myClass: def my_inefficient_method(self): final_vector = [] # Assume 'my_vector' and 'my_other_vector' are defined numpy arrays for point in all_points: if not self.condition_met: a = self.my_func1(point, my_vector) b = self.my_func2(point, my_other_vector) else: a = self.my_func3(point, my_vector) b = self.my_func4(point, my_other_vector) c = a + b final_vector.append(c) # Choose random element from resulting vector 'final_vector' self.condition_met is set before my_inefficient_method is called, so it seems unnecessary to check it each time, but I am not sure how to better write this. Since there are no destructive operations here it is seems like I could rewrite this entire thing as a vectorized operation -- is that possible? any ideas how to do this?

    Read the article

  • How to optimize my PageRank calculation?

    - by asmaier
    In the book Programming Collective Intelligence I found the following function to compute the PageRank: def calculatepagerank(self,iterations=20): # clear out the current PageRank tables self.con.execute("drop table if exists pagerank") self.con.execute("create table pagerank(urlid primary key,score)") self.con.execute("create index prankidx on pagerank(urlid)") # initialize every url with a PageRank of 1.0 self.con.execute("insert into pagerank select rowid,1.0 from urllist") self.dbcommit() for i in range(iterations): print "Iteration %d" % i for (urlid,) in self.con.execute("select rowid from urllist"): pr=0.15 # Loop through all the pages that link to this one for (linker,) in self.con.execute("select distinct fromid from link where toid=%d" % urlid): # Get the PageRank of the linker linkingpr=self.con.execute("select score from pagerank where urlid=%d" % linker).fetchone()[0] # Get the total number of links from the linker linkingcount=self.con.execute("select count(*) from link where fromid=%d" % linker).fetchone()[0] pr+=0.85*(linkingpr/linkingcount) self.con.execute("update pagerank set score=%f where urlid=%d" % (pr,urlid)) self.dbcommit() However, this function is very slow, because of all the SQL queries in every iteration >>> import cProfile >>> cProfile.run("crawler.calculatepagerank()") 2262510 function calls in 136.006 CPU seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 136.006 136.006 <string>:1(<module>) 1 20.826 20.826 136.006 136.006 searchengine.py:179(calculatepagerank) 21 0.000 0.000 0.528 0.025 searchengine.py:27(dbcommit) 21 0.528 0.025 0.528 0.025 {method 'commit' of 'sqlite3.Connecti 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler 1339864 112.602 0.000 112.602 0.000 {method 'execute' of 'sqlite3.Connec 922600 2.050 0.000 2.050 0.000 {method 'fetchone' of 'sqlite3.Cursor' 1 0.000 0.000 0.000 0.000 {range} So I optimized the function and came up with this: def calculatepagerank2(self,iterations=20): # clear out the current PageRank tables self.con.execute("drop table if exists pagerank") self.con.execute("create table pagerank(urlid primary key,score)") self.con.execute("create index prankidx on pagerank(urlid)") # initialize every url with a PageRank of 1.0 self.con.execute("insert into pagerank select rowid,1.0 from urllist") self.dbcommit() inlinks={} numoutlinks={} pagerank={} for (urlid,) in self.con.execute("select rowid from urllist"): inlinks[urlid]=[] numoutlinks[urlid]=0 # Initialize pagerank vector with 1.0 pagerank[urlid]=1.0 # Loop through all the pages that link to this one for (inlink,) in self.con.execute("select distinct fromid from link where toid=%d" % urlid): inlinks[urlid].append(inlink) # get number of outgoing links from a page numoutlinks[urlid]=self.con.execute("select count(*) from link where fromid=%d" % urlid).fetchone()[0] for i in range(iterations): print "Iteration %d" % i for urlid in pagerank: pr=0.15 for link in inlinks[urlid]: linkpr=pagerank[link] linkcount=numoutlinks[link] pr+=0.85*(linkpr/linkcount) pagerank[urlid]=pr for urlid in pagerank: self.con.execute("update pagerank set score=%f where urlid=%d" % (pagerank[urlid],urlid)) self.dbcommit() This function is 20 times faster (but uses a lot more memory for all the temporary dictionaries) because it avoids the unnecessary SQL queries in every iteration: >>> cProfile.run("crawler.calculatepagerank2()") 64802 function calls in 6.950 CPU seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.004 0.004 6.950 6.950 <string>:1(<module>) 1 1.004 1.004 6.946 6.946 searchengine.py:207(calculatepagerank2 2 0.000 0.000 0.104 0.052 searchengine.py:27(dbcommit) 23065 0.012 0.000 0.012 0.000 {meth 'append' of 'list' objects} 2 0.104 0.052 0.104 0.052 {meth 'commit' of 'sqlite3.Connection 1 0.000 0.000 0.000 0.000 {meth 'disable' of '_lsprof.Profiler' 31298 5.809 0.000 5.809 0.000 {meth 'execute' of 'sqlite3.Connectio 10431 0.018 0.000 0.018 0.000 {method 'fetchone' of 'sqlite3.Cursor' 1 0.000 0.000 0.000 0.000 {range} But is it possible to further reduce the number of SQL queries to speed up the function even more?

    Read the article

  • Optimizing Levenshtein Distance Algorithm

    - by Matt
    I have a stored procedure that uses Levenshtein Distance to determine the result closest to what the user typed. The only thing really affecting the speed is the function that calculates the Levenshtein Distance for all the records before selecting the record with the lowest distance (I've verified this by putting a 0 in place of the call to the Levenshtein function). The table has 1.5 million records, so even the slightest adjustment may shave off a few seconds. Right now the entire thing runs over 10 minutes. Here's the method I'm using: ALTER function dbo.Levenshtein ( @Source nvarchar(200), @Target nvarchar(200) ) RETURNS int AS BEGIN DECLARE @Source_len int, @Target_len int, @i int, @j int, @Source_char nchar, @Dist int, @Dist_temp int, @Distv0 varbinary(8000), @Distv1 varbinary(8000) SELECT @Source_len = LEN(@Source), @Target_len = LEN(@Target), @Distv1 = 0x0000, @j = 1, @i = 1, @Dist = 0 WHILE @j <= @Target_len BEGIN SELECT @Distv1 = @Distv1 + CAST(@j AS binary(2)), @j = @j + 1 END WHILE @i <= @Source_len BEGIN SELECT @Source_char = SUBSTRING(@Source, @i, 1), @Dist = @i, @Distv0 = CAST(@i AS binary(2)), @j = 1 WHILE @j <= @Target_len BEGIN SET @Dist = @Dist + 1 SET @Dist_temp = CAST(SUBSTRING(@Distv1, @j+@j-1, 2) AS int) + CASE WHEN @Source_char = SUBSTRING(@Target, @j, 1) THEN 0 ELSE 1 END IF @Dist > @Dist_temp BEGIN SET @Dist = @Dist_temp END SET @Dist_temp = CAST(SUBSTRING(@Distv1, @j+@j+1, 2) AS int)+1 IF @Dist > @Dist_temp SET @Dist = @Dist_temp BEGIN SELECT @Distv0 = @Distv0 + CAST(@Dist AS binary(2)), @j = @j + 1 END END SELECT @Distv1 = @Distv0, @i = @i + 1 END RETURN @Dist END Anyone have any ideas? Any input is appreciated. Thanks, Matt

    Read the article

  • Prevent full table scan for query with multiple where clauses

    - by Dave Jarvis
    A while ago I posted a message about optimizing a query in MySQL. I have since ported the data and query to PostgreSQL, but now PostgreSQL has the same problem. The solution in MySQL was to force the optimizer to not optimize using STRAIGHT_JOIN. PostgreSQL offers no such option. Here is the explain: Here is the query: SELECT avg(d.amount) AS amount, y.year FROM station s, station_district sd, year_ref y, month_ref m, daily d LEFT JOIN city c ON c.id = 10663 WHERE -- Find all the stations within a specific unit radius ... -- 6371.009 * SQRT( POW(RADIANS(c.latitude_decimal - s.latitude_decimal), 2) + (COS(RADIANS(c.latitude_decimal + s.latitude_decimal) / 2) * POW(RADIANS(c.longitude_decimal - s.longitude_decimal), 2)) ) <= 50 AND -- Ignore stations outside the given elevations -- s.elevation BETWEEN 0 AND 2000 AND sd.id = s.station_district_id AND -- Gather all known years for that station ... -- y.station_district_id = sd.id AND -- The data before 1900 is shaky; insufficient after 2009. -- y.year BETWEEN 1980 AND 2000 AND -- Filtered by all known months ... -- m.year_ref_id = y.id AND m.month = 12 AND -- Whittled down by category ... -- m.category_id = '001' AND -- Into the valid daily climate data. -- m.id = d.month_ref_id AND d.daily_flag_id <> 'M' GROUP BY y.year It appears as though PostgreSQL is looking at the DAILY table first, which is simply not the right way to go about this query as there are nearly 300 million rows. How do I force PostgreSQL to start at the CITY table? Thank you!

    Read the article

  • C# File IO with Streams - Best Memory Buffer Size

    - by AJ
    Hi, I am writing a small IO library to assist with a larger (hobby) project. A part of this library performs various functions on a file, which is read / written via the FileStream object. On each StreamReader.Read(...) pass, I fire off an event which will be used in the main app to display progress information. The processing that goes on in the loop is vaired, but is not too time consuming (it could just be a simple file copy, for example, or may involve encryption...). My main question is: What is the best memory buffer size to use? Thinking about physical disk layouts, I could pick 2k, which would cover a CD sector size and is a nice multiple of a 512 byte hard disk sector. Higher up the abstraction tree, you could go for a larger buffer which could read an entire FAT cluster at a time. I realise with today's PC's, I could go for a more memory hungry option (a couple of MiB, for example), but then I increase the time between UI updates and the user perceives a less responsive app. As an aside, I'm eventually hoping to provide a similar interface to files hosted on FTP / HTTP servers (over a local network / fastish DSL). What would be the best memory buffer size for those (again, a "best-case" tradeoff between perceived responsiveness vs. performance). Thanks in advance for any ideas, Adam

    Read the article

  • Ever any performance different between Java >> and >>> right shift operators?

    - by Sean Owen
    Is there ever reason to think the (signed) and (unsigned) right bit-shift operators in Java would perform differently? I can't detect any difference on my machine. This is purely an academic question; it's never going to be the bottleneck I'm sure. I know: it's best to write what you mean foremost; use for division by 2, for example. I assume it comes down to which architectures have which operations implemented as an instruction.

    Read the article

  • Speed improvements for Perl's chameneos-redux in the Computer Language Benchmarks Game

    - by Robert P
    Ever looked at the Computer Language Benchmarks Game (formerly known as the Great Language Shootout)? Perl has some pretty healthy competition there at the moment. It also occurs to me that there's probably some places that Perl's scores could be improved. The biggest one is in the chameneos-redux script right now—the Perl version runs the worst out of any language: 1,626 times slower than the C baseline solution! There are some restrictions on how the programs can be made and optimized, and there is Perl's interpreted runtime penalty, but 1,626 times? There's got to be something that can get the runtime of this program way down. Taking a look at the source code and the challenge, how can the speed be improved?

    Read the article

  • Fastest image iteration in Python

    - by Greg
    I am creating a simple green screen app with Python 2.7.4 but am getting quite slow results. I am currently using PIL 1.1.7 to load and iterate the images and saw huge speed-ups changing from the old getpixel() to the newer load() and pixel access object indexing. However the following loop still takes around 2.5 seconds to run for an image of around 720p resolution: def colorclose(Cb_p, Cr_p, Cb_key, Cr_key, tola, tolb): temp = math.sqrt((Cb_key-Cb_p)**2+(Cr_key-Cr_p)**2) if temp < tola: return 0.0 else: if temp < tolb: return (temp-tola)/(tolb-tola) else: return 1.0 .... for x in range(width): for y in range(height): Y, cb, cr = fg_cbcr_list[x, y] mask = colorclose(cb, cr, cb_key, cr_key, tola, tolb) mask = 1 - mask bgr, bgg, bgb = bg_list[x,y] fgr, fgg, fgb = fg_list[x,y] pixels[x,y] = ( (int)(fgr - mask*key_color[0] + mask*bgr), (int)(fgg - mask*key_color[1] + mask*bgg), (int)(fgb - mask*key_color[2] + mask*bgb)) Am I doing anything hugely inefficient here which makes it run so slow? I have seen similar, simpler examples where the loop is replaced by a boolean matrix for instance, but for this case I can't see a way to replace the loop. The pixels[x,y] assignment seems to take the most amount of time but not knowing Python very well I am unsure of a more efficient way to do this. Any help would be appreciated.

    Read the article

  • 3 dimensional bin packing algorithms

    - by BuschnicK
    I'm faced with a 3 dimensional bin packing problem and am currently conducting some preliminary research as to which algorithms/heuristics are currently yielding the best results. Since the problem is NP hard I do not expect to find the optimal solution in every case, but I was wondering: 1) what are the best exact solvers? Branch and Bound? What problem instance sizes can I expect to solve with reasonable computing resources? 2) what are the best heuristic solvers? 3) What off-the-shelf solutions exist to conduct some experiments with?

    Read the article

  • Speed up bitstring/bit operations in Python?

    - by Xavier Ho
    I wrote a prime number generator using Sieve of Eratosthenes and Python 3.1. The code runs correctly and gracefully at 0.32 seconds on ideone.com to generate prime numbers up to 1,000,000. # from bitstring import BitString def prime_numbers(limit=1000000): '''Prime number generator. Yields the series 2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ... using Sieve of Eratosthenes. ''' yield 2 sub_limit = int(limit**0.5) flags = [False, False] + [True] * (limit - 2) # flags = BitString(limit) # Step through all the odd numbers for i in range(3, limit, 2): if flags[i] is False: # if flags[i] is True: continue yield i # Exclude further multiples of the current prime number if i <= sub_limit: for j in range(i*3, limit, i<<1): flags[j] = False # flags[j] = True The problem is, I run out of memory when I try to generate numbers up to 1,000,000,000. flags = [False, False] + [True] * (limit - 2) MemoryError As you can imagine, allocating 1 billion boolean values (1 byte 4 or 8 bytes (see comment) each in Python) is really not feasible, so I looked into bitstring. I figured, using 1 bit for each flag would be much more memory-efficient. However, the program's performance dropped drastically - 24 seconds runtime, for prime number up to 1,000,000. This is probably due to the internal implementation of bitstring. You can comment/uncomment the three lines to see what I changed to use BitString, as the code snippet above. My question is, is there a way to speed up my program, with or without bitstring?

    Read the article

  • C# Improvement on a Fire-and-Forget

    - by adam
    Greetings I have a program that creates multiples instances of a class, runs the same long-running Update method on all instances and waits for completion. I'm following Kev's approach from this question of adding the Update to ThreadPool.QueueUserWorkItem. In the main prog., I'm sleeping for a few minutes and checking a Boolean in the last child to see if done while(!child[child.Length-1].isFinished){ Thread.Sleep(...); } This solution is working the way I want, but is there a better way to do this? Both for the independent instances and checking if all work is done. Thanks

    Read the article

  • Optimizing a lot of Scanner.findWithinHorizon(pattern, 0) calls

    - by darvids0n
    I'm building a process which extracts data from 6 csv-style files and two poorly laid out .txt reports and builds output CSVs, and I'm fully aware that there's going to be some overhead searching through all that whitespace thousands of times, but I never anticipated converting about about 50,000 records would take 12 hours. Excerpt of my manual matching code (I know it's horrible that I use lists of tokens like that, but it was the best thing I could think of): public static String lookup(List<String> tokensBefore, List<String> tokensAfter) { String result = null; while(_match(tokensBefore)) { // block until all input is read if(id.hasNext()) { result = id.next(); // capture the next token that matches if(_matchImmediate(tokensAfter)) // try to match tokensAfter to this result return result; } else return null; // end of file; no match } return null; // no matches } private static boolean _match(List<String> tokens) { return _match(tokens, true); } private static boolean _match(List<String> tokens, boolean block) { if(tokens != null && !tokens.isEmpty()) { if(id.findWithinHorizon(tokens.get(0), 0) == null) return false; for(int i = 1; i <= tokens.size(); i++) { if (i == tokens.size()) { // matches all tokens return true; } else if(id.hasNext() && !id.next().matches(tokens.get(i))) { break; // break to blocking behaviour } } } else { return true; // empty list always matches } if(block) return _match(tokens); // loop until we find something or nothing else return false; // return after just one attempted match } private static boolean _matchImmediate(List<String> tokens) { if(tokens != null) { for(int i = 0; i <= tokens.size(); i++) { if (i == tokens.size()) { // matches all tokens return true; } else if(!id.hasNext() || !id.next().matches(tokens.get(i))) { return false; // doesn't match, or end of file } } return false; // we have some serious problems if this ever gets called } else { return true; // empty list always matches } } Basically wondering how I would work in an efficient string search (Boyer-Moore or similar). My Scanner id is scanning a java.util.String, figured buffering it to memory would reduce I/O since the search here is being performed thousands of times on a relatively small file. The performance increase compared to scanning a BufferedReader(FileReader(File)) was probably less than 1%, the process still looks to be taking a LONG time. I've also traced execution and the slowness of my overall conversion process is definitely between the first and last like of the lookup method. In fact, so much so that I ran a shortcut process to count the number of occurrences of various identifiers in the .csv-style files (I use 2 lookup methods, this is just one of them) and the process completed indexing approx 4 different identifiers for 50,000 records in less than a minute. Compared to 12 hours, that's instant. Some notes (updated): I don't necessarily need the pattern-matching behaviour, I only get the first field of a line of text so I need to match line breaks or use Scanner.nextLine(). All ID numbers I need start at position 0 of a line and run through til the first block of whitespace, after which is the name of the corresponding object. I would ideally want to return a String, not an int locating the line number or start position of the result, but if it's faster then it will still work just fine. If an int is being returned, however, then I would now have to seek to that line again just to get the ID; storing the ID of every line that is searched sounds like a way around that. Anything to help me out, even if it saves 1ms per search, will help, so all input is appreciated. Thankyou! Usage scenario 1: I have a list of objects in file A, who in the old-style system have an id number which is not in file A. It is, however, POSSIBLY in another csv-style file (file B) or possibly still in a .txt report (file C) which each also contain a bunch of other information which is not useful here, and so file B needs to be searched through for the object's full name (1 token since it would reside within the second column of any given line), and then the first column should be the ID number. If that doesn't work, we then have to split the search token by whitespace into separate tokens before doing a search of file C for those tokens as well. Generalised code: String field; for (/* each record in file A */) { /* construct the rest of this object from file A info */ // now to find the ID, if we can List<String> objectName = new ArrayList<String>(1); objectName.add(Pattern.quote(thisObject.fullName)); field = lookup(objectSearchToken, objectName); // search file B if(field == null) // not found in file B { lookupReset(false); // initialise scanner to check file C objectName.clear(); // not using the full name String[] tokens = thisObject.fullName.split(id.delimiter().pattern()); for(String s : tokens) objectName.add(Pattern.quote(s)); field = lookup(objectSearchToken, objectName); // search file C lookupReset(true); // back to file B } else { /* found it, file B specific processing here */ } if(field != null) // found it in B or C thisObject.ID = field; } The objectName tokens are all uppercase words with possible hyphens or apostrophes in them, separated by spaces. Much like a person's name. As per a comment, I will pre-compile the regex for my objectSearchToken, which is just [\r\n]+. What's ending up happening in file C is, every single line is being checked, even the 95% of lines which don't contain an ID number and object name at the start. Would it be quicker to use ^[\r\n]+.*(objectname) instead of two separate regexes? It may reduce the number of _match executions. The more general case of that would be, concatenate all tokensBefore with all tokensAfter, and put a .* in the middle. It would need to be matching backwards through the file though, otherwise it would match the correct line but with a huge .* block in the middle with lots of lines. The above situation could be resolved if I could get java.util.Scanner to return the token previous to the current one after a call to findWithinHorizon. I have another usage scenario. Will put it up asap.

    Read the article

  • sql: Group by x,y,z; return grouped by x,y with lowest f(z)

    - by Sai Emrys
    This is for http://cssfingerprint.com I collect timing stats about how fast the different methods I use perform on different browsers, etc., so that I can optimize the scraping speed. Separately, I have a report about what each method returns for a handful of URLs with known-correct values, so that I can tell which methods are bogus on which browsers. (Each is different, alas.) The related tables look like this: CREATE TABLE `browser_tests` ( `id` int(11) NOT NULL AUTO_INCREMENT, `bogus` tinyint(1) DEFAULT NULL, `result` tinyint(1) DEFAULT NULL, `method` varchar(255) DEFAULT NULL, `url` varchar(255) DEFAULT NULL, `os` varchar(255) DEFAULT NULL, `browser` varchar(255) DEFAULT NULL, `version` varchar(255) DEFAULT NULL, `created_at` datetime DEFAULT NULL, `updated_at` datetime DEFAULT NULL, `user_agent` varchar(255) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=33784 DEFAULT CHARSET=latin1 CREATE TABLE `method_timings` ( `id` int(11) NOT NULL AUTO_INCREMENT, `method` varchar(255) DEFAULT NULL, `batch_size` int(11) DEFAULT NULL, `timing` int(11) DEFAULT NULL, `os` varchar(255) DEFAULT NULL, `browser` varchar(255) DEFAULT NULL, `version` varchar(255) DEFAULT NULL, `user_agent` varchar(255) DEFAULT NULL, `created_at` datetime DEFAULT NULL, `updated_at` datetime DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=28849 DEFAULT CHARSET=latin1 (user_agent is broken down pre-insert into browser, version, and os from a small list of recognized values using regex; I keep the original user-agent string just in case.) I have a query like this that tells me the average timing for every non-bogus browser / version / method tuple: select c, avg(bogus) as bog, timing, method, browser, version from browser_tests as b inner join ( select count(*) as c, round(avg(timing)) as timing, method, browser, version from method_timings group by browser, version, method having c > 10 order by browser, version, timing ) as t using (browser, version, method) group by browser, version, method having bog < 1 order by browser, version, timing; Which returns something like: c bog tim method browser version 88 0.8333 184 reuse_insert Chrome 4.0.249.89 18 0.0000 238 mass_insert_width Chrome 4.0.249.89 70 0.0400 246 mass_insert Chrome 4.0.249.89 70 0.0400 327 mass_noinsert Chrome 4.0.249.89 88 0.0556 367 reuse_reinsert Chrome 4.0.249.89 88 0.0556 383 jquery Chrome 4.0.249.89 88 0.0556 863 full_reinsert Chrome 4.0.249.89 187 0.0000 105 jquery Chrome 5.0.307.11 187 0.8806 109 reuse_insert Chrome 5.0.307.11 123 0.0000 110 mass_insert_width Chrome 5.0.307.11 176 0.0000 231 mass_noinsert Chrome 5.0.307.11 176 0.0000 237 mass_insert Chrome 5.0.307.11 187 0.0000 314 reuse_reinsert Chrome 5.0.307.11 187 0.0000 372 full_reinsert Chrome 5.0.307.11 12 0.7500 82 reuse_insert Chrome 5.0.335.0 12 0.2500 102 jquery Chrome 5.0.335.0 [...] I want to modify this query to return only the browser/version/method with the lowest timing - i.e. something like: 88 0.8333 184 reuse_insert Chrome 4.0.249.89 187 0.0000 105 jquery Chrome 5.0.307.11 12 0.7500 82 reuse_insert Chrome 5.0.335.0 [...] How can I do this, while still returning the method that goes with that lowest timing? I could filter it app-side, but I'd rather do this in mysql since it'd work better with my caching.

    Read the article

  • Explanation of Pingdom Results

    - by Computer Guru
    Hi, I'm trying to optimize my page load times, and I'm using Pingdom to test the site response times. However, I'm not exactly sure what the various components of the "time bar" mean. Example link: http://tools.pingdom.com/fpt/?url=http://neosmart.net/forums//&id=2230361 According to them, the portion of the bar that is yellow is the time between "start" and "connect" and the portion of the bar that is green is the time between "connect" and "first byte" with the blue section being the actual transfer time (time between "first byte" and "last byte"). If I'm trying to the first two (which take very long in my case), what's the recommended course of action? Thanks.

    Read the article

  • Query Optimizing Request

    - by mithilatw
    I am very sorry if this question is structured in not a very helpful manner or the question itself is not a very good one! I need to update a MSSQL table call component every 10 minutes based on information from another table call materials_progress I have nearly 60000 records in component and more than 10000 records in materials_progress I wrote an update query to do the job, but it takes longer than 4 minutes to complete execution! Here is the query : UPDATE component SET stage_id = CASE WHEN t.required_quantity <= t.total_received THEN 27 WHEN t.total_ordered < t.total_received THEN 18 ELSE 18 END FROM ( SELECT mp.job_id, mp.line_no, mp.component, l.quantity AS line_quantity, CASE WHEN mp.component_name_id = 2 THEN l.quantity*2 ELSE l.quantity END AS required_quantity, SUM(ordered) AS total_ordered, SUM(received) AS total_received , c.component_id FROM line l LEFT JOIN component c ON c.line_id = l.line_id LEFT JOIN materials_progress mp ON l.job_id = mp.job_id AND l.line_no = mp.line_no AND c.component_name_id = mp.component_name_id WHERE mp.job_id IS NOT NULL AND (mp.cancelled IS NULL OR mp.cancelled = 0) AND (mp.manual_override IS NULL OR mp.manual_override = 0) AND c.stage_id = 18 GROUP BY mp.job_id, mp.line_no, mp.component, l.quantity, mp.component_name_id, component_id ) AS t WHERE component.component_id = t.component_id I am not going to explain the scenario as it too complex.. could somebody please please tell me what makes this query this much expensive and a way to get around it? Thank you very very much in advance!!!

    Read the article

  • Two strange efficiency problems in Mathematica

    - by Jess Riedel
    FIRST PROBLEM I have timed how long it takes to compute the following statements (where V[x] is a time-intensive function call): Alice = Table[V[i],{i,1,300},{1000}]; Bob = Table[Table[V[i],{i,1,300}],{1000}]^tr; Chris_pre = Table[V[i],{i,1,300}]; Chris = Table[Chris_pre,{1000}]^tr; Alice, Bob, and Chris are identical matricies computed 3 slightly different ways. I find that Chris is computed 1000 times faster than Alice and Bob. It is not surprising that Alice is computed 1000 times slower because, naively, the function V must be called 1000 more times than when Chris is computed. But it is very surprising that Bob is so slow, since he is computed identically to Chris except that Chris stores the intermediate step Chris_pre. Why does Bob evaluate so slowly? SECOND PROBLEM Suppose I want to compile a function in Mathematica of the form f(x)=x+y where "y" is a constant fixed at compile time (but which I prefer not to directly replace in the code with its numerical because I want to be able to easily change it). If y's actual value is y=7.3, and I define f1=Compile[{x},x+y] f2=Compile[{x},x+7.3] then f1 runs 50% slower than f2. How do I make Mathematica replace "y" with "7.3" when f1 is compiled, so that f1 runs as fast as f2? Many thanks!

    Read the article

  • Linux PDF/Postscript Optimizing

    - by Sheldon Ross
    So I have a report system built using Java and iText. PDF templates are created using Scribus. The Java code merges the data into the document using iText. The files are then copied over to a NFS share, and a BASH script prints them. I use acroread to convert them to PS, then lpr the PS. The FOSS application pdftops is horribly inefficient. My main problem is that the PDF's generated using iText/Scribus are very large. And I've recently run into the problem where acroread pukes because it hits 4gb of mem usage on large (300+ pages) documents. (Adobe is painfully slow at updating stuff to 64 bit). Now I can use Adobe reader on Windows, and use the Create Print PDF option or whatever its called, and it greatly( 10x) reduces the size of the PDF(it removes alot of metadata about form fields and such it appears) and produces a PDF that is basically a Print image. My question is does anyone know of a good solution/program for doing something similiar on Linux. Ideally, it would optimize the PDF, reduce size, and reduce PS complexity so the printer could print faster as it takes about 15-20 seconds a page to print right now.

    Read the article

  • Jetty 6 - QueuedThreadPool versus ThreadPool

    - by Walter White
    Hi all, I am using Jetty 6 and was wondering when the QueuedThreadPool should be used over the ThreadPool? By default, Jetty 6 comes configured with the QueuedThreadPool. My server has Java 6 installed so I was thinking that I should use the ThreadPool: <New class="org.mortbay.thread.QueuedThreadPool"> <Set name="minThreads">10</Set> <Set name="maxThreads">200</Set> <Set name="lowThreads">20</Set> <Set name="SpawnOrShrinkAt">2</Set> </New> <New class="org.mortbay.thread.concurrent.ThreadPool"> <Set name="corePoolSize">50</Set> <Set name="maximumPoolSize">50</Set> </New> Thanks, Walter

    Read the article

  • How to optimize this algorithm?

    - by Bakhtiyor
    I have two sets of arrays like this for example. $Arr1['uid'][]='user 1'; $Arr1['weight'][]=1; $Arr1['uid'][]='user 2'; $Arr1['weight'][]=10; $Arr1['uid'][]='user 3'; $Arr1['weight'][]=5; $Arr2['uid'][]='user 1'; $Arr2['weight'][]=3; $Arr2['uid'][]='user 4'; $Arr2['weight'][]=20; $Arr2['uid'][]='user 5'; $Arr2['weight'][]=15; $Arr2['uid'][]='user 2'; $Arr2['weight'][]=2; The size of two arrays could be different of course. $Arr1 has coefficient of 0.7 and $Arr2 has coefficient of 0.3. I need to calculate following formula $result=$Arr1['weight'][$index]*$Arr1Coeff+$Arr2['weight'][$index]*$Arr2Coeff; where $Arr1['uid']=$Arr2['uid']. So when $Arr1['uid'] doesn't exists in $Arr2 then we need to omit $Arr2 and vice versa. And, here is an algorithm I am using now. foreach($Arr1['uid'] as $index=>$arr1_uid){ $pos=array_search($arr1_uid, $Arr2['uid']); if ($pos===false){ $result=$Arr1['weight'][$index]*$Arr1Coeff; echo "<br>$arr1_uid has not found and RES=".$result; }else{ $result=$Arr1['weight'][$index]*$Arr1Coeff+$Arr2['weight'][$pos]*$Arr2Coeff; echo "<br>$arr1_uid has found on $pos and RES=".$result; } } foreach($Arr2['uid'] as $index=>$arr2_uid){ if (!in_array($arr2_uid, $Arr1['uid'])){ $result=$Arr2['weight'][$index]*$Arr2Coeff; echo "<br>$arr2_uid has not found and RES=".$result; }else{ echo "<br>$arr2_uid has found somewhere"; } } The question is how can I optimize this algorithm? Can you offer other better solution for this problem? Thank you.

    Read the article

  • iPhone - dequeueReusableCellWithIdentifier usage

    - by Jukurrpa
    Hi, I'm working on a iPhone app which has a pretty large UITableView with data taken from the web, so I'm trying to optimize its creation and usage. I found out that dequeueReusableCellWithIdentifier is pretty useful, but after seeing many source codes using this, I'm wondering if the usage I make of this function is the good one. Here is what people usually do: UITableViewCell* cell = [tableView dequeueReusableCellWithIdentifier:@"Cell"]; if (cell == nil) { cell = [[UITableViewCell alloc] initWithFrame:CGRectZero reuseIdentifier:@"Cell"]; // Add elements to the cell return cell; And here is the way I did it: NSString identifier = [NSString stringWithFormat:@"Cell @d", indexPath.row]: // The cell row UITableViewCell* cell = [tableView dequeueReusableCellWithIdentifier:identifier]; if (cell != nil) return cell; cell = [[UITableViewCell alloc] initWithFrame:CGRectZero reuseIdentifier:identifier]; // Add elements to the cell return cell; The difference is that people use the same identifier for every cell, so dequeuing one only avoids to alloc a new one. For me, the point of queuing was to give each cell a unique identifier, so when the app asks for a cell it already displayed, neither allocation nor element adding have to be done. In fine I don't know which is best, the "common" method ceils the table's memory usage to the exact number of cells it display, whislt the method I use seems to favor speed as it keeps all calculated cells, but can cause large memory consumption (unless there's an inner limit to the queue). Am I wrong to use it this way? Or is it just up to the developper, depending on his needs?

    Read the article

< Previous Page | 219 220 221 222 223 224 225 226 227 228 229 230  | Next Page >