Search Results

Search found 3481 results on 140 pages for 'convex optimization'.

Page 49/140 | < Previous Page | 45 46 47 48 49 50 51 52 53 54 55 56  | Next Page >

  • SQLite3-ruby extremely slow under 1.9.1?

    - by NilObject
    I decided to upgrade my server to Ruby 1.9.1, and a lot of things are indeed much faster. However, I have a process that dumps a database to sqlite, and it's become glacially slow. What used to take 30 seconds now takes upwards of 10 minutes. The code does several create table statements, and then lots of inserts. The insert statements nearly all use placeholders (?), so SQLite is doing the heavy lifting of binding the parameters. In short, I can't see why this particular usage has slowed down so much. Does anyone know of any problems that have caused it? I'm using sqlite3-ruby (1.2.5), and I'm hoping that someone has encountered this and profiled it. If not, I guess I'm going to learn how to profile ruby code :)

    Read the article

  • How to optimize my PageRank calculation?

    - by asmaier
    In the book Programming Collective Intelligence I found the following function to compute the PageRank: def calculatepagerank(self,iterations=20): # clear out the current PageRank tables self.con.execute("drop table if exists pagerank") self.con.execute("create table pagerank(urlid primary key,score)") self.con.execute("create index prankidx on pagerank(urlid)") # initialize every url with a PageRank of 1.0 self.con.execute("insert into pagerank select rowid,1.0 from urllist") self.dbcommit() for i in range(iterations): print "Iteration %d" % i for (urlid,) in self.con.execute("select rowid from urllist"): pr=0.15 # Loop through all the pages that link to this one for (linker,) in self.con.execute("select distinct fromid from link where toid=%d" % urlid): # Get the PageRank of the linker linkingpr=self.con.execute("select score from pagerank where urlid=%d" % linker).fetchone()[0] # Get the total number of links from the linker linkingcount=self.con.execute("select count(*) from link where fromid=%d" % linker).fetchone()[0] pr+=0.85*(linkingpr/linkingcount) self.con.execute("update pagerank set score=%f where urlid=%d" % (pr,urlid)) self.dbcommit() However, this function is very slow, because of all the SQL queries in every iteration >>> import cProfile >>> cProfile.run("crawler.calculatepagerank()") 2262510 function calls in 136.006 CPU seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 136.006 136.006 <string>:1(<module>) 1 20.826 20.826 136.006 136.006 searchengine.py:179(calculatepagerank) 21 0.000 0.000 0.528 0.025 searchengine.py:27(dbcommit) 21 0.528 0.025 0.528 0.025 {method 'commit' of 'sqlite3.Connecti 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler 1339864 112.602 0.000 112.602 0.000 {method 'execute' of 'sqlite3.Connec 922600 2.050 0.000 2.050 0.000 {method 'fetchone' of 'sqlite3.Cursor' 1 0.000 0.000 0.000 0.000 {range} So I optimized the function and came up with this: def calculatepagerank2(self,iterations=20): # clear out the current PageRank tables self.con.execute("drop table if exists pagerank") self.con.execute("create table pagerank(urlid primary key,score)") self.con.execute("create index prankidx on pagerank(urlid)") # initialize every url with a PageRank of 1.0 self.con.execute("insert into pagerank select rowid,1.0 from urllist") self.dbcommit() inlinks={} numoutlinks={} pagerank={} for (urlid,) in self.con.execute("select rowid from urllist"): inlinks[urlid]=[] numoutlinks[urlid]=0 # Initialize pagerank vector with 1.0 pagerank[urlid]=1.0 # Loop through all the pages that link to this one for (inlink,) in self.con.execute("select distinct fromid from link where toid=%d" % urlid): inlinks[urlid].append(inlink) # get number of outgoing links from a page numoutlinks[urlid]=self.con.execute("select count(*) from link where fromid=%d" % urlid).fetchone()[0] for i in range(iterations): print "Iteration %d" % i for urlid in pagerank: pr=0.15 for link in inlinks[urlid]: linkpr=pagerank[link] linkcount=numoutlinks[link] pr+=0.85*(linkpr/linkcount) pagerank[urlid]=pr for urlid in pagerank: self.con.execute("update pagerank set score=%f where urlid=%d" % (pagerank[urlid],urlid)) self.dbcommit() This function is 20 times faster (but uses a lot more memory for all the temporary dictionaries) because it avoids the unnecessary SQL queries in every iteration: >>> cProfile.run("crawler.calculatepagerank2()") 64802 function calls in 6.950 CPU seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.004 0.004 6.950 6.950 <string>:1(<module>) 1 1.004 1.004 6.946 6.946 searchengine.py:207(calculatepagerank2 2 0.000 0.000 0.104 0.052 searchengine.py:27(dbcommit) 23065 0.012 0.000 0.012 0.000 {meth 'append' of 'list' objects} 2 0.104 0.052 0.104 0.052 {meth 'commit' of 'sqlite3.Connection 1 0.000 0.000 0.000 0.000 {meth 'disable' of '_lsprof.Profiler' 31298 5.809 0.000 5.809 0.000 {meth 'execute' of 'sqlite3.Connectio 10431 0.018 0.000 0.018 0.000 {method 'fetchone' of 'sqlite3.Cursor' 1 0.000 0.000 0.000 0.000 {range} But is it possible to further reduce the number of SQL queries to speed up the function even more?

    Read the article

  • Lots of mysql Sleep processes

    - by user259284
    Hello, I am still having trouble with my mysql server. It seems that since i optimize it, the tables were growing and now sometimes is very slow again. I have no idea of how to optimize more. mySQL server has 48GB of RAM and mysqld is using about 8, most of the tables are innoDB. Site has about 2000 users online. I also run explain on every query and every one of them is indexed. mySQL processes: http://www.pik.ba/mysqlStanje.php my.cnf: # The MySQL database server configuration file. # # You can copy this to one of: # - "/etc/mysql/my.cnf" to set global options, # - "~/.my.cnf" to set user-specific options. # # One can use all long options that the program supports. # Run program with --help to get a list of available options and with # --print-defaults to see which it would actually understand and use. # # For explanations see # http://dev.mysql.com/doc/mysql/en/server-system-variables.html # This will be passed to all mysql clients # It has been reported that passwords should be enclosed with ticks/quotes # escpecially if they contain "#" chars... # Remember to edit /etc/mysql/debian.cnf when changing the socket location. [client] port = 3306 socket = /var/run/mysqld/mysqld.sock # Here is entries for some specific programs # The following values assume you have at least 32M ram # This was formally known as [safe_mysqld]. Both versions are currently parsed. [mysqld_safe] socket = /var/run/mysqld/mysqld.sock nice = 0 [mysqld] # # * Basic Settings # user = mysql pid-file = /var/run/mysqld/mysqld.pid socket = /var/run/mysqld/mysqld.sock port = 3306 basedir = /usr datadir = /var/lib/mysql tmpdir = /tmp language = /usr/share/mysql/english skip-external-locking # # Instead of skip-networking the default is now to listen only on # localhost which is more compatible and is not less secure. bind-address = 10.100.27.30 # # * Fine Tuning # key_buffer = 64M key_buffer_size = 512M max_allowed_packet = 16M thread_stack = 128K thread_cache_size = 8 # This replaces the startup script and checks MyISAM tables if needed # the first time they are touched myisam-recover = BACKUP max_connections = 1000 table_cache = 1000 join_buffer_size = 2M tmp_table_size = 2G max_heap_table_size = 2G innodb_buffer_pool_size = 3G innodb_additional_mem_pool_size = 128M innodb_log_file_size = 100M log-slow-queries = /var/log/mysql/slow.log sort_buffer_size = 5M net_buffer_length = 5M read_buffer_size = 2M read_rnd_buffer_size = 12M thread_concurrency = 10 ft_min_word_len = 3 #thread_concurrency = 10 # # * Query Cache Configuration # query_cache_limit = 1M query_cache_size = 512M # # * Logging and Replication # # Both location gets rotated by the cronjob. # Be aware that this log type is a performance killer. #log = /var/log/mysql/mysql.log # # Error logging goes to syslog. This is a Debian improvement :) # # Here you can see queries with especially long duration #log_slow_queries = /var/log/mysql/mysql-slow.log #long_query_time = 2 #log-queries-not-using-indexes # # The following can be used as easy to replay backup logs or for replication. # note: if you are setting up a replication slave, see README.Debian about # other settings you may need to change. #server-id = 1 #log_bin = /var/log/mysql/mysql-bin.log expire_logs_days = 10 max_binlog_size = 100M #binlog_do_db = include_database_name #binlog_ignore_db = include_database_name # # * BerkeleyDB # # Using BerkeleyDB is now discouraged as its support will cease in 5.1.12. skip-bdb # # * InnoDB # # InnoDB is enabled by default with a 10MB datafile in /var/lib/mysql/. # Read the manual for more InnoDB related options. There are many! # You might want to disable InnoDB to shrink the mysqld process by circa 100MB. #skip-innodb # # * Security Features # # Read the manual, too, if you want chroot! # chroot = /var/lib/mysql/ # # For generating SSL certificates I recommend the OpenSSL GUI "tinyca". # # ssl-ca=/etc/mysql/cacert.pem # ssl-cert=/etc/mysql/server-cert.pem # ssl-key=/etc/mysql/server-key.pem [mysqldump] quick quote-names max_allowed_packet = 16M [mysql] #no-auto-rehash # faster start of mysql but no tab completition [isamchk] key_buffer = 16M # # * NDB Cluster # # See /usr/share/doc/mysql-server-*/README.Debian for more information. # # The following configuration is read by the NDB Data Nodes (ndbd processes) # not from the NDB Management Nodes (ndb_mgmd processes). # # [MYSQL_CLUSTER] # ndb-connectstring=127.0.0.1 # # * IMPORTANT: Additional settings that can override those from this file! # The files must end with '.cnf', otherwise they'll be ignored. # !includedir /etc/mysql/conf.d/

    Read the article

  • MySQL Query performance - huge difference in time

    - by Damo
    I have a query that is returning in vastly different amounts of time between 2 datasets. For one set (database A) it returns in a few seconds, for the other (database B)....well I haven't waited long enough yet, but over 10 minutes. I have dumped both of these databases to my local machine where I can reproduce the issue running MySQL 5.1.37. Curiously, database B is smaller than database A. A stripped down version of the query that reproduces the problem is: SELECT * FROM po_shipment ps JOIN po_shipment_item psi USING (ship_id) JOIN po_alloc pa ON ps.ship_id = pa.ship_id AND pa.UID_items = psi.UID_items JOIN po_header ph ON pa.hdr_id = ph.hdr_id LEFT JOIN EVENT_TABLE ev0 ON ev0.TABLE_ID1 = ps.ship_id AND ev0.EVENT_TYPE = 'MAS0' LEFT JOIN EVENT_TABLE ev1 ON ev1.TABLE_ID1 = ps.ship_id AND ev1.EVENT_TYPE = 'MAS1' LEFT JOIN EVENT_TABLE ev2 ON ev2.TABLE_ID1 = ps.ship_id AND ev2.EVENT_TYPE = 'MAS2' LEFT JOIN EVENT_TABLE ev3 ON ev3.TABLE_ID1 = ps.ship_id AND ev3.EVENT_TYPE = 'MAS3' LEFT JOIN EVENT_TABLE ev4 ON ev4.TABLE_ID1 = ps.ship_id AND ev4.EVENT_TYPE = 'MAS4' LEFT JOIN EVENT_TABLE ev5 ON ev5.TABLE_ID1 = ps.ship_id AND ev5.EVENT_TYPE = 'MAS5' WHERE ps.eta >= '2010-03-22' GROUP BY ps.ship_id LIMIT 100; The EXPLAIN query plan for the first database (A) that returns in ~2 seconds is: +----+-------------+-------+--------+----------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+---------+------------------------------+------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+--------+----------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+---------+------------------------------+------+----------------------------------------------+ | 1 | SIMPLE | ps | range | PRIMARY,IX_ETA_DATE | IX_ETA_DATE | 4 | NULL | 174 | Using where; Using temporary; Using filesort | | 1 | SIMPLE | ev0 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_PROD.ps.ship_id,const | 1 | | | 1 | SIMPLE | ev1 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_PROD.ps.ship_id,const | 1 | | | 1 | SIMPLE | ev2 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_PROD.ps.ship_id,const | 1 | | | 1 | SIMPLE | ev3 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_PROD.ps.ship_id,const | 1 | | | 1 | SIMPLE | ev4 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_PROD.ps.ship_id,const | 1 | | | 1 | SIMPLE | ev5 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_PROD.ps.ship_id,const | 1 | | | 1 | SIMPLE | psi | ref | PRIMARY,IX_po_shipment_item_po_shipment1,FK_po_shipment_item_po_shipment1 | IX_po_shipment_item_po_shipment1 | 4 | UNIVIS_PROD.ps.ship_id | 1 | | | 1 | SIMPLE | pa | ref | IX_po_alloc_po_shipment_item2,IX_po_alloc_po_details_old,FK_po_alloc_po_shipment1,FK_po_alloc_po_shipment_item1,FK_po_alloc_po_header1 | FK_po_alloc_po_shipment1 | 4 | UNIVIS_PROD.psi.ship_id | 5 | Using where | | 1 | SIMPLE | ph | eq_ref | PRIMARY,IX_HDR_ID | PRIMARY | 4 | UNIVIS_PROD.pa.hdr_id | 1 | | +----+-------------+-------+--------+----------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+---------+------------------------------+------+----------------------------------------------+ The EXPLAIN query plan for the second database (B) that returns in 600 seconds is: +----+-------------+-------+--------+----------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+---------+--------------------------------+------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+--------+----------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+---------+--------------------------------+------+----------------------------------------------+ | 1 | SIMPLE | ps | range | PRIMARY,IX_ETA_DATE | IX_ETA_DATE | 4 | NULL | 38 | Using where; Using temporary; Using filesort | | 1 | SIMPLE | psi | ref | PRIMARY,IX_po_shipment_item_po_shipment1,FK_po_shipment_item_po_shipment1 | IX_po_shipment_item_po_shipment1 | 4 | UNIVIS_DEV01.ps.ship_id | 1 | | | 1 | SIMPLE | ev0 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_DEV01.psi.ship_id,const | 1 | | | 1 | SIMPLE | ev1 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_DEV01.psi.ship_id,const | 1 | | | 1 | SIMPLE | ev2 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_DEV01.ps.ship_id,const | 1 | | | 1 | SIMPLE | ev3 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_DEV01.psi.ship_id,const | 1 | | | 1 | SIMPLE | ev4 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_DEV01.psi.ship_id,const | 1 | | | 1 | SIMPLE | ev5 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_DEV01.ps.ship_id,const | 1 | | | 1 | SIMPLE | pa | ref | IX_po_alloc_po_shipment_item2,IX_po_alloc_po_details_old,FK_po_alloc_po_shipment1,FK_po_alloc_po_shipment_item1,FK_po_alloc_po_header1 | IX_po_alloc_po_shipment_item2 | 4 | UNIVIS_DEV01.ps.ship_id | 4 | Using where | | 1 | SIMPLE | ph | eq_ref | PRIMARY,IX_HDR_ID | PRIMARY | 4 | UNIVIS_DEV01.pa.hdr_id | 1 | | +----+-------------+-------+--------+----------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+---------+--------------------------------+------+----------------------------------------------+ When database B is running I can look at the MySQL Administrator and the state remains at "Copying to tmp table" indefinitely. Database A also has this state but for only a second or so. There are no differences in the table structure, indexes, keys etc between these databases (I have done show create tables and diff'd them). The sizes of the tables are: database A: po_shipment 1776 po_shipment_item 1945 po_alloc 36298 po_header 71642 EVENT_TABLE 1608 database B: po_shipment 463 po_shipment_item 470 po_alloc 3291 po_header 56149 EVENT_TABLE 1089 Some points to note: Removing the WHERE clause makes the query return < 1 sec. Removing the GROUP BY makes the query return < 1 sec. Removing ev5, ev4, ev3 etc makes the query get faster for each one removed. Can anyone suggest how to resolve this issue? What have I missed? Many Thanks.

    Read the article

  • Optimizing Levenshtein Distance Algorithm

    - by Matt
    I have a stored procedure that uses Levenshtein Distance to determine the result closest to what the user typed. The only thing really affecting the speed is the function that calculates the Levenshtein Distance for all the records before selecting the record with the lowest distance (I've verified this by putting a 0 in place of the call to the Levenshtein function). The table has 1.5 million records, so even the slightest adjustment may shave off a few seconds. Right now the entire thing runs over 10 minutes. Here's the method I'm using: ALTER function dbo.Levenshtein ( @Source nvarchar(200), @Target nvarchar(200) ) RETURNS int AS BEGIN DECLARE @Source_len int, @Target_len int, @i int, @j int, @Source_char nchar, @Dist int, @Dist_temp int, @Distv0 varbinary(8000), @Distv1 varbinary(8000) SELECT @Source_len = LEN(@Source), @Target_len = LEN(@Target), @Distv1 = 0x0000, @j = 1, @i = 1, @Dist = 0 WHILE @j <= @Target_len BEGIN SELECT @Distv1 = @Distv1 + CAST(@j AS binary(2)), @j = @j + 1 END WHILE @i <= @Source_len BEGIN SELECT @Source_char = SUBSTRING(@Source, @i, 1), @Dist = @i, @Distv0 = CAST(@i AS binary(2)), @j = 1 WHILE @j <= @Target_len BEGIN SET @Dist = @Dist + 1 SET @Dist_temp = CAST(SUBSTRING(@Distv1, @j+@j-1, 2) AS int) + CASE WHEN @Source_char = SUBSTRING(@Target, @j, 1) THEN 0 ELSE 1 END IF @Dist > @Dist_temp BEGIN SET @Dist = @Dist_temp END SET @Dist_temp = CAST(SUBSTRING(@Distv1, @j+@j+1, 2) AS int)+1 IF @Dist > @Dist_temp SET @Dist = @Dist_temp BEGIN SELECT @Distv0 = @Distv0 + CAST(@Dist AS binary(2)), @j = @j + 1 END END SELECT @Distv1 = @Distv0, @i = @i + 1 END RETURN @Dist END Anyone have any ideas? Any input is appreciated. Thanks, Matt

    Read the article

  • Prevent full table scan for query with multiple where clauses

    - by Dave Jarvis
    A while ago I posted a message about optimizing a query in MySQL. I have since ported the data and query to PostgreSQL, but now PostgreSQL has the same problem. The solution in MySQL was to force the optimizer to not optimize using STRAIGHT_JOIN. PostgreSQL offers no such option. Here is the explain: Here is the query: SELECT avg(d.amount) AS amount, y.year FROM station s, station_district sd, year_ref y, month_ref m, daily d LEFT JOIN city c ON c.id = 10663 WHERE -- Find all the stations within a specific unit radius ... -- 6371.009 * SQRT( POW(RADIANS(c.latitude_decimal - s.latitude_decimal), 2) + (COS(RADIANS(c.latitude_decimal + s.latitude_decimal) / 2) * POW(RADIANS(c.longitude_decimal - s.longitude_decimal), 2)) ) <= 50 AND -- Ignore stations outside the given elevations -- s.elevation BETWEEN 0 AND 2000 AND sd.id = s.station_district_id AND -- Gather all known years for that station ... -- y.station_district_id = sd.id AND -- The data before 1900 is shaky; insufficient after 2009. -- y.year BETWEEN 1980 AND 2000 AND -- Filtered by all known months ... -- m.year_ref_id = y.id AND m.month = 12 AND -- Whittled down by category ... -- m.category_id = '001' AND -- Into the valid daily climate data. -- m.id = d.month_ref_id AND d.daily_flag_id <> 'M' GROUP BY y.year It appears as though PostgreSQL is looking at the DAILY table first, which is simply not the right way to go about this query as there are nearly 300 million rows. How do I force PostgreSQL to start at the CITY table? Thank you!

    Read the article

  • C# File IO with Streams - Best Memory Buffer Size

    - by AJ
    Hi, I am writing a small IO library to assist with a larger (hobby) project. A part of this library performs various functions on a file, which is read / written via the FileStream object. On each StreamReader.Read(...) pass, I fire off an event which will be used in the main app to display progress information. The processing that goes on in the loop is vaired, but is not too time consuming (it could just be a simple file copy, for example, or may involve encryption...). My main question is: What is the best memory buffer size to use? Thinking about physical disk layouts, I could pick 2k, which would cover a CD sector size and is a nice multiple of a 512 byte hard disk sector. Higher up the abstraction tree, you could go for a larger buffer which could read an entire FAT cluster at a time. I realise with today's PC's, I could go for a more memory hungry option (a couple of MiB, for example), but then I increase the time between UI updates and the user perceives a less responsive app. As an aside, I'm eventually hoping to provide a similar interface to files hosted on FTP / HTTP servers (over a local network / fastish DSL). What would be the best memory buffer size for those (again, a "best-case" tradeoff between perceived responsiveness vs. performance). Thanks in advance for any ideas, Adam

    Read the article

  • MySQL queries - how expensive are they really?

    - by incrediman
    I've heard that mysql queries are very expensive, and that you should avoid at all costs making too many of them. I'm developing a site that will be used by quite a few people, and I'm wondering: How expensive are mysql queries actually? If I have 400,000 people in my database, how expensive is it to query it for one of them? How close attention do I need to pay that I don't make too many queries per client request?

    Read the article

  • Database design in blogging systems

    - by Peter
    As a learning exercise I'm trying to put myself a blogging system. The goal is to code something that will let me create multiple blogs, like blogger.com or wordpress.com, but much simplified. I would like to ask you, what do you think is best database design for this type of script. Is it better to have one big table, containing posts from all blogs of all users (like friendfeed) or would it be better to create separate table for each blog's posts? Big thanks in advance for your help, Peter.

    Read the article

  • Speed improvements for Perl's chameneos-redux in the Computer Language Benchmarks Game

    - by Robert P
    Ever looked at the Computer Language Benchmarks Game (formerly known as the Great Language Shootout)? Perl has some pretty healthy competition there at the moment. It also occurs to me that there's probably some places that Perl's scores could be improved. The biggest one is in the chameneos-redux script right now—the Perl version runs the worst out of any language: 1,626 times slower than the C baseline solution! There are some restrictions on how the programs can be made and optimized, and there is Perl's interpreted runtime penalty, but 1,626 times? There's got to be something that can get the runtime of this program way down. Taking a look at the source code and the challenge, how can the speed be improved?

    Read the article

  • Ever any performance different between Java >> and >>> right shift operators?

    - by Sean Owen
    Is there ever reason to think the (signed) and (unsigned) right bit-shift operators in Java would perform differently? I can't detect any difference on my machine. This is purely an academic question; it's never going to be the bottleneck I'm sure. I know: it's best to write what you mean foremost; use for division by 2, for example. I assume it comes down to which architectures have which operations implemented as an instruction.

    Read the article

  • How to optimize this MySQL query

    - by James Simpson
    This query was working fine when the database was small, but now that there are millions of rows in the database, I am realizing I should have looked at optimizing this earlier. It is looking at over 600,000 rows and is Using where; Using temporary; Using filesort (which leads to an execution time of 5-10 seconds). It is using an index on the field 'battle_type.' SELECT username, SUM( outcome ) AS wins, COUNT( * ) - SUM( outcome ) AS losses FROM tblBattleHistory WHERE battle_type = '0' && outcome < '2' GROUP BY username ORDER BY wins DESC , losses ASC , username ASC LIMIT 0 , 50

    Read the article

  • Fastest image iteration in Python

    - by Greg
    I am creating a simple green screen app with Python 2.7.4 but am getting quite slow results. I am currently using PIL 1.1.7 to load and iterate the images and saw huge speed-ups changing from the old getpixel() to the newer load() and pixel access object indexing. However the following loop still takes around 2.5 seconds to run for an image of around 720p resolution: def colorclose(Cb_p, Cr_p, Cb_key, Cr_key, tola, tolb): temp = math.sqrt((Cb_key-Cb_p)**2+(Cr_key-Cr_p)**2) if temp < tola: return 0.0 else: if temp < tolb: return (temp-tola)/(tolb-tola) else: return 1.0 .... for x in range(width): for y in range(height): Y, cb, cr = fg_cbcr_list[x, y] mask = colorclose(cb, cr, cb_key, cr_key, tola, tolb) mask = 1 - mask bgr, bgg, bgb = bg_list[x,y] fgr, fgg, fgb = fg_list[x,y] pixels[x,y] = ( (int)(fgr - mask*key_color[0] + mask*bgr), (int)(fgg - mask*key_color[1] + mask*bgg), (int)(fgb - mask*key_color[2] + mask*bgb)) Am I doing anything hugely inefficient here which makes it run so slow? I have seen similar, simpler examples where the loop is replaced by a boolean matrix for instance, but for this case I can't see a way to replace the loop. The pixels[x,y] assignment seems to take the most amount of time but not knowing Python very well I am unsure of a more efficient way to do this. Any help would be appreciated.

    Read the article

  • 3 dimensional bin packing algorithms

    - by BuschnicK
    I'm faced with a 3 dimensional bin packing problem and am currently conducting some preliminary research as to which algorithms/heuristics are currently yielding the best results. Since the problem is NP hard I do not expect to find the optimal solution in every case, but I was wondering: 1) what are the best exact solvers? Branch and Bound? What problem instance sizes can I expect to solve with reasonable computing resources? 2) what are the best heuristic solvers? 3) What off-the-shelf solutions exist to conduct some experiments with?

    Read the article

  • Speed up bitstring/bit operations in Python?

    - by Xavier Ho
    I wrote a prime number generator using Sieve of Eratosthenes and Python 3.1. The code runs correctly and gracefully at 0.32 seconds on ideone.com to generate prime numbers up to 1,000,000. # from bitstring import BitString def prime_numbers(limit=1000000): '''Prime number generator. Yields the series 2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ... using Sieve of Eratosthenes. ''' yield 2 sub_limit = int(limit**0.5) flags = [False, False] + [True] * (limit - 2) # flags = BitString(limit) # Step through all the odd numbers for i in range(3, limit, 2): if flags[i] is False: # if flags[i] is True: continue yield i # Exclude further multiples of the current prime number if i <= sub_limit: for j in range(i*3, limit, i<<1): flags[j] = False # flags[j] = True The problem is, I run out of memory when I try to generate numbers up to 1,000,000,000. flags = [False, False] + [True] * (limit - 2) MemoryError As you can imagine, allocating 1 billion boolean values (1 byte 4 or 8 bytes (see comment) each in Python) is really not feasible, so I looked into bitstring. I figured, using 1 bit for each flag would be much more memory-efficient. However, the program's performance dropped drastically - 24 seconds runtime, for prime number up to 1,000,000. This is probably due to the internal implementation of bitstring. You can comment/uncomment the three lines to see what I changed to use BitString, as the code snippet above. My question is, is there a way to speed up my program, with or without bitstring?

    Read the article

  • Explanation of Pingdom Results

    - by Computer Guru
    Hi, I'm trying to optimize my page load times, and I'm using Pingdom to test the site response times. However, I'm not exactly sure what the various components of the "time bar" mean. Example link: http://tools.pingdom.com/fpt/?url=http://neosmart.net/forums//&id=2230361 According to them, the portion of the bar that is yellow is the time between "start" and "connect" and the portion of the bar that is green is the time between "connect" and "first byte" with the blue section being the actual transfer time (time between "first byte" and "last byte"). If I'm trying to the first two (which take very long in my case), what's the recommended course of action? Thanks.

    Read the article

  • Optimizing a lot of Scanner.findWithinHorizon(pattern, 0) calls

    - by darvids0n
    I'm building a process which extracts data from 6 csv-style files and two poorly laid out .txt reports and builds output CSVs, and I'm fully aware that there's going to be some overhead searching through all that whitespace thousands of times, but I never anticipated converting about about 50,000 records would take 12 hours. Excerpt of my manual matching code (I know it's horrible that I use lists of tokens like that, but it was the best thing I could think of): public static String lookup(List<String> tokensBefore, List<String> tokensAfter) { String result = null; while(_match(tokensBefore)) { // block until all input is read if(id.hasNext()) { result = id.next(); // capture the next token that matches if(_matchImmediate(tokensAfter)) // try to match tokensAfter to this result return result; } else return null; // end of file; no match } return null; // no matches } private static boolean _match(List<String> tokens) { return _match(tokens, true); } private static boolean _match(List<String> tokens, boolean block) { if(tokens != null && !tokens.isEmpty()) { if(id.findWithinHorizon(tokens.get(0), 0) == null) return false; for(int i = 1; i <= tokens.size(); i++) { if (i == tokens.size()) { // matches all tokens return true; } else if(id.hasNext() && !id.next().matches(tokens.get(i))) { break; // break to blocking behaviour } } } else { return true; // empty list always matches } if(block) return _match(tokens); // loop until we find something or nothing else return false; // return after just one attempted match } private static boolean _matchImmediate(List<String> tokens) { if(tokens != null) { for(int i = 0; i <= tokens.size(); i++) { if (i == tokens.size()) { // matches all tokens return true; } else if(!id.hasNext() || !id.next().matches(tokens.get(i))) { return false; // doesn't match, or end of file } } return false; // we have some serious problems if this ever gets called } else { return true; // empty list always matches } } Basically wondering how I would work in an efficient string search (Boyer-Moore or similar). My Scanner id is scanning a java.util.String, figured buffering it to memory would reduce I/O since the search here is being performed thousands of times on a relatively small file. The performance increase compared to scanning a BufferedReader(FileReader(File)) was probably less than 1%, the process still looks to be taking a LONG time. I've also traced execution and the slowness of my overall conversion process is definitely between the first and last like of the lookup method. In fact, so much so that I ran a shortcut process to count the number of occurrences of various identifiers in the .csv-style files (I use 2 lookup methods, this is just one of them) and the process completed indexing approx 4 different identifiers for 50,000 records in less than a minute. Compared to 12 hours, that's instant. Some notes (updated): I don't necessarily need the pattern-matching behaviour, I only get the first field of a line of text so I need to match line breaks or use Scanner.nextLine(). All ID numbers I need start at position 0 of a line and run through til the first block of whitespace, after which is the name of the corresponding object. I would ideally want to return a String, not an int locating the line number or start position of the result, but if it's faster then it will still work just fine. If an int is being returned, however, then I would now have to seek to that line again just to get the ID; storing the ID of every line that is searched sounds like a way around that. Anything to help me out, even if it saves 1ms per search, will help, so all input is appreciated. Thankyou! Usage scenario 1: I have a list of objects in file A, who in the old-style system have an id number which is not in file A. It is, however, POSSIBLY in another csv-style file (file B) or possibly still in a .txt report (file C) which each also contain a bunch of other information which is not useful here, and so file B needs to be searched through for the object's full name (1 token since it would reside within the second column of any given line), and then the first column should be the ID number. If that doesn't work, we then have to split the search token by whitespace into separate tokens before doing a search of file C for those tokens as well. Generalised code: String field; for (/* each record in file A */) { /* construct the rest of this object from file A info */ // now to find the ID, if we can List<String> objectName = new ArrayList<String>(1); objectName.add(Pattern.quote(thisObject.fullName)); field = lookup(objectSearchToken, objectName); // search file B if(field == null) // not found in file B { lookupReset(false); // initialise scanner to check file C objectName.clear(); // not using the full name String[] tokens = thisObject.fullName.split(id.delimiter().pattern()); for(String s : tokens) objectName.add(Pattern.quote(s)); field = lookup(objectSearchToken, objectName); // search file C lookupReset(true); // back to file B } else { /* found it, file B specific processing here */ } if(field != null) // found it in B or C thisObject.ID = field; } The objectName tokens are all uppercase words with possible hyphens or apostrophes in them, separated by spaces. Much like a person's name. As per a comment, I will pre-compile the regex for my objectSearchToken, which is just [\r\n]+. What's ending up happening in file C is, every single line is being checked, even the 95% of lines which don't contain an ID number and object name at the start. Would it be quicker to use ^[\r\n]+.*(objectname) instead of two separate regexes? It may reduce the number of _match executions. The more general case of that would be, concatenate all tokensBefore with all tokensAfter, and put a .* in the middle. It would need to be matching backwards through the file though, otherwise it would match the correct line but with a huge .* block in the middle with lots of lines. The above situation could be resolved if I could get java.util.Scanner to return the token previous to the current one after a call to findWithinHorizon. I have another usage scenario. Will put it up asap.

    Read the article

  • sql: Group by x,y,z; return grouped by x,y with lowest f(z)

    - by Sai Emrys
    This is for http://cssfingerprint.com I collect timing stats about how fast the different methods I use perform on different browsers, etc., so that I can optimize the scraping speed. Separately, I have a report about what each method returns for a handful of URLs with known-correct values, so that I can tell which methods are bogus on which browsers. (Each is different, alas.) The related tables look like this: CREATE TABLE `browser_tests` ( `id` int(11) NOT NULL AUTO_INCREMENT, `bogus` tinyint(1) DEFAULT NULL, `result` tinyint(1) DEFAULT NULL, `method` varchar(255) DEFAULT NULL, `url` varchar(255) DEFAULT NULL, `os` varchar(255) DEFAULT NULL, `browser` varchar(255) DEFAULT NULL, `version` varchar(255) DEFAULT NULL, `created_at` datetime DEFAULT NULL, `updated_at` datetime DEFAULT NULL, `user_agent` varchar(255) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=33784 DEFAULT CHARSET=latin1 CREATE TABLE `method_timings` ( `id` int(11) NOT NULL AUTO_INCREMENT, `method` varchar(255) DEFAULT NULL, `batch_size` int(11) DEFAULT NULL, `timing` int(11) DEFAULT NULL, `os` varchar(255) DEFAULT NULL, `browser` varchar(255) DEFAULT NULL, `version` varchar(255) DEFAULT NULL, `user_agent` varchar(255) DEFAULT NULL, `created_at` datetime DEFAULT NULL, `updated_at` datetime DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=28849 DEFAULT CHARSET=latin1 (user_agent is broken down pre-insert into browser, version, and os from a small list of recognized values using regex; I keep the original user-agent string just in case.) I have a query like this that tells me the average timing for every non-bogus browser / version / method tuple: select c, avg(bogus) as bog, timing, method, browser, version from browser_tests as b inner join ( select count(*) as c, round(avg(timing)) as timing, method, browser, version from method_timings group by browser, version, method having c > 10 order by browser, version, timing ) as t using (browser, version, method) group by browser, version, method having bog < 1 order by browser, version, timing; Which returns something like: c bog tim method browser version 88 0.8333 184 reuse_insert Chrome 4.0.249.89 18 0.0000 238 mass_insert_width Chrome 4.0.249.89 70 0.0400 246 mass_insert Chrome 4.0.249.89 70 0.0400 327 mass_noinsert Chrome 4.0.249.89 88 0.0556 367 reuse_reinsert Chrome 4.0.249.89 88 0.0556 383 jquery Chrome 4.0.249.89 88 0.0556 863 full_reinsert Chrome 4.0.249.89 187 0.0000 105 jquery Chrome 5.0.307.11 187 0.8806 109 reuse_insert Chrome 5.0.307.11 123 0.0000 110 mass_insert_width Chrome 5.0.307.11 176 0.0000 231 mass_noinsert Chrome 5.0.307.11 176 0.0000 237 mass_insert Chrome 5.0.307.11 187 0.0000 314 reuse_reinsert Chrome 5.0.307.11 187 0.0000 372 full_reinsert Chrome 5.0.307.11 12 0.7500 82 reuse_insert Chrome 5.0.335.0 12 0.2500 102 jquery Chrome 5.0.335.0 [...] I want to modify this query to return only the browser/version/method with the lowest timing - i.e. something like: 88 0.8333 184 reuse_insert Chrome 4.0.249.89 187 0.0000 105 jquery Chrome 5.0.307.11 12 0.7500 82 reuse_insert Chrome 5.0.335.0 [...] How can I do this, while still returning the method that goes with that lowest timing? I could filter it app-side, but I'd rather do this in mysql since it'd work better with my caching.

    Read the article

  • C# Improvement on a Fire-and-Forget

    - by adam
    Greetings I have a program that creates multiples instances of a class, runs the same long-running Update method on all instances and waits for completion. I'm following Kev's approach from this question of adding the Update to ThreadPool.QueueUserWorkItem. In the main prog., I'm sleeping for a few minutes and checking a Boolean in the last child to see if done while(!child[child.Length-1].isFinished){ Thread.Sleep(...); } This solution is working the way I want, but is there a better way to do this? Both for the independent instances and checking if all work is done. Thanks

    Read the article

  • Query Optimizing Request

    - by mithilatw
    I am very sorry if this question is structured in not a very helpful manner or the question itself is not a very good one! I need to update a MSSQL table call component every 10 minutes based on information from another table call materials_progress I have nearly 60000 records in component and more than 10000 records in materials_progress I wrote an update query to do the job, but it takes longer than 4 minutes to complete execution! Here is the query : UPDATE component SET stage_id = CASE WHEN t.required_quantity <= t.total_received THEN 27 WHEN t.total_ordered < t.total_received THEN 18 ELSE 18 END FROM ( SELECT mp.job_id, mp.line_no, mp.component, l.quantity AS line_quantity, CASE WHEN mp.component_name_id = 2 THEN l.quantity*2 ELSE l.quantity END AS required_quantity, SUM(ordered) AS total_ordered, SUM(received) AS total_received , c.component_id FROM line l LEFT JOIN component c ON c.line_id = l.line_id LEFT JOIN materials_progress mp ON l.job_id = mp.job_id AND l.line_no = mp.line_no AND c.component_name_id = mp.component_name_id WHERE mp.job_id IS NOT NULL AND (mp.cancelled IS NULL OR mp.cancelled = 0) AND (mp.manual_override IS NULL OR mp.manual_override = 0) AND c.stage_id = 18 GROUP BY mp.job_id, mp.line_no, mp.component, l.quantity, mp.component_name_id, component_id ) AS t WHERE component.component_id = t.component_id I am not going to explain the scenario as it too complex.. could somebody please please tell me what makes this query this much expensive and a way to get around it? Thank you very very much in advance!!!

    Read the article

  • iPhone - dequeueReusableCellWithIdentifier usage

    - by Jukurrpa
    Hi, I'm working on a iPhone app which has a pretty large UITableView with data taken from the web, so I'm trying to optimize its creation and usage. I found out that dequeueReusableCellWithIdentifier is pretty useful, but after seeing many source codes using this, I'm wondering if the usage I make of this function is the good one. Here is what people usually do: UITableViewCell* cell = [tableView dequeueReusableCellWithIdentifier:@"Cell"]; if (cell == nil) { cell = [[UITableViewCell alloc] initWithFrame:CGRectZero reuseIdentifier:@"Cell"]; // Add elements to the cell return cell; And here is the way I did it: NSString identifier = [NSString stringWithFormat:@"Cell @d", indexPath.row]: // The cell row UITableViewCell* cell = [tableView dequeueReusableCellWithIdentifier:identifier]; if (cell != nil) return cell; cell = [[UITableViewCell alloc] initWithFrame:CGRectZero reuseIdentifier:identifier]; // Add elements to the cell return cell; The difference is that people use the same identifier for every cell, so dequeuing one only avoids to alloc a new one. For me, the point of queuing was to give each cell a unique identifier, so when the app asks for a cell it already displayed, neither allocation nor element adding have to be done. In fine I don't know which is best, the "common" method ceils the table's memory usage to the exact number of cells it display, whislt the method I use seems to favor speed as it keeps all calculated cells, but can cause large memory consumption (unless there's an inner limit to the queue). Am I wrong to use it this way? Or is it just up to the developper, depending on his needs?

    Read the article

  • Jetty 6 - QueuedThreadPool versus ThreadPool

    - by Walter White
    Hi all, I am using Jetty 6 and was wondering when the QueuedThreadPool should be used over the ThreadPool? By default, Jetty 6 comes configured with the QueuedThreadPool. My server has Java 6 installed so I was thinking that I should use the ThreadPool: <New class="org.mortbay.thread.QueuedThreadPool"> <Set name="minThreads">10</Set> <Set name="maxThreads">200</Set> <Set name="lowThreads">20</Set> <Set name="SpawnOrShrinkAt">2</Set> </New> <New class="org.mortbay.thread.concurrent.ThreadPool"> <Set name="corePoolSize">50</Set> <Set name="maximumPoolSize">50</Set> </New> Thanks, Walter

    Read the article

  • Two strange efficiency problems in Mathematica

    - by Jess Riedel
    FIRST PROBLEM I have timed how long it takes to compute the following statements (where V[x] is a time-intensive function call): Alice = Table[V[i],{i,1,300},{1000}]; Bob = Table[Table[V[i],{i,1,300}],{1000}]^tr; Chris_pre = Table[V[i],{i,1,300}]; Chris = Table[Chris_pre,{1000}]^tr; Alice, Bob, and Chris are identical matricies computed 3 slightly different ways. I find that Chris is computed 1000 times faster than Alice and Bob. It is not surprising that Alice is computed 1000 times slower because, naively, the function V must be called 1000 more times than when Chris is computed. But it is very surprising that Bob is so slow, since he is computed identically to Chris except that Chris stores the intermediate step Chris_pre. Why does Bob evaluate so slowly? SECOND PROBLEM Suppose I want to compile a function in Mathematica of the form f(x)=x+y where "y" is a constant fixed at compile time (but which I prefer not to directly replace in the code with its numerical because I want to be able to easily change it). If y's actual value is y=7.3, and I define f1=Compile[{x},x+y] f2=Compile[{x},x+7.3] then f1 runs 50% slower than f2. How do I make Mathematica replace "y" with "7.3" when f1 is compiled, so that f1 runs as fast as f2? Many thanks!

    Read the article

  • Iteration speed of int vs long

    - by jqno
    I have the following two programs: long startTime = System.currentTimeMillis(); for (int i = 0; i < N; i++); long endTime = System.currentTimeMillis(); System.out.println("Elapsed time: " + (endTime - startTime) + " msecs"); and long startTime = System.currentTimeMillis(); for (long i = 0; i < N; i++); long endTime = System.currentTimeMillis(); System.out.println("Elapsed time: " + (endTime - startTime) + " msecs"); Note: the only difference is the type of the loop variable (int and long). When I run this, the first program consistently prints between 0 and 16 msecs, regardless of the value of N. The second takes a lot longer. For N == Integer.MAX_VALUE, it runs in about 1800 msecs on my machine. The run time appears to be more or less linear in N. So why is this? I suppose the JIT-compiler optimizes the int loop to death. And for good reason, because obviously it doesn't do anything. But why doesn't it do so for the long loop as well? A colleague thought we might be measuring the JIT compiler doing its work in the long loop, but since the run time seems to be linear in N, this probably isn't the case.

    Read the article

  • Linux PDF/Postscript Optimizing

    - by Sheldon Ross
    So I have a report system built using Java and iText. PDF templates are created using Scribus. The Java code merges the data into the document using iText. The files are then copied over to a NFS share, and a BASH script prints them. I use acroread to convert them to PS, then lpr the PS. The FOSS application pdftops is horribly inefficient. My main problem is that the PDF's generated using iText/Scribus are very large. And I've recently run into the problem where acroread pukes because it hits 4gb of mem usage on large (300+ pages) documents. (Adobe is painfully slow at updating stuff to 64 bit). Now I can use Adobe reader on Windows, and use the Create Print PDF option or whatever its called, and it greatly( 10x) reduces the size of the PDF(it removes alot of metadata about form fields and such it appears) and produces a PDF that is basically a Print image. My question is does anyone know of a good solution/program for doing something similiar on Linux. Ideally, it would optimize the PDF, reduce size, and reduce PS complexity so the printer could print faster as it takes about 15-20 seconds a page to print right now.

    Read the article

< Previous Page | 45 46 47 48 49 50 51 52 53 54 55 56  | Next Page >