Search Results

Search found 3512 results on 141 pages for 'premature optimization'.

Page 49/141 | < Previous Page | 45 46 47 48 49 50 51 52 53 54 55 56 | Next Page >

Fastest way to remove non-numeric characters from a VARCHAR in SQL Server

- by Dan Herbert

I'm writing an import utility that is using phone numbers as a unique key within the import. I need to check that the phone number does not already exist in my DB. The problem is that phone numbers in the DB could have things like dashes and parenthesis and possibly other things. I wrote a function to remove these things, the problem is that it is slow and with thousands of records in my DB and thousands of records to import at once, this process can be unacceptably slow. I've already made the phone number column an index. I tried using the script from this post: http://stackoverflow.com/questions/52315/t-sql-trim-nbsp-and-other-non-alphanumeric-characters But that didn't speed it up any. Is there a faster way to remove non-numeric characters? Something that can perform well when 10,000 to 100,000 records have to be compared. Whatever is done needs to perform fast. Update Given what people responded with, I think I'm going to have to clean the fields before I run the import utility. To answer the question of what I'm writing the import utility in, it is a C# app. I'm comparing BIGINT to BIGINT now, with no need to alter DB data and I'm still taking a performance hit with a very small set of data (about 2000 records). Could comparing BIGINT to BIGINT be slowing things down? I've optimized the code side of my app as much as I can (removed regexes, removed unneccessary DB calls). Although I can't isolate SQL as the source of the problem anymore, I still feel like it is.

Read the article
ILOG CPLEX: how to populate IloLPMatrix while using addGe to set up the model?

- by downer

I have a queatoin about IloLPMatrix and addGe. I was trying to follow the example of AdMIPex5.java to generate user defined cutting planes based on the solution to the LP relaxation. The difference is that eh initial MIP model is not read in from a mps file, but set up in the code using methods like addGe, addLe etc. I think this is why I ran into problems while copying the exampe to do the following. IloLPMatrix lp = (IloLPMatrix)cplex.LPMatrixIterator().next(); lp from the above line turns to be NULL. I am wondering 1. What is the relationship between IloLPMatrix and the addLe, addGe commands? I tried to addLPMatrix() to the model, and then used model.addGe methods. but the LPMatrix seems to be empty still. How do I populate the IloLPMatrix of the moel according to the value that I had set up using addGe and addLe. Is the a method to this easily, or do I have to set them up row by row myself? I was doing this to get the number of variables and their values by doing lp.getNumVars(). Is there other methods that I can use to get the number of variables and their values wihout doing these, since my system is set up by addLe, addGe etc? Thanks a lot for your help on this.

Read the article
Tips on creating user interfaces and optimizing the user experience

- by Saif Bechan

I am currently working on a project where a lot of user interaction is going to take place. There is also a commercial side as people can buy certain items and services. In my opinion a good blend of user interface, speed and security is essential for these types of websites. It is fairly easy to use ajax and JavaScript nowadays to do almost everything, as there are a lot of libraries available such as jQuery and others. But this can have some performance and incompatibility issues. This can lead to users just going to the next website. The overall look of the website is important too. Where to place certain buttons, where to place certain types of articles such as faq and support. Where and how to display error messages so that the user sees them but are not bothering him. And an overall color scheme is important too. The basic question is: How to create an interface that triggers a user to buy/use your services I know psychology also plays a huge role in how users interact with your website. The color scheme for example is important. When the colors are irritating on a website you just want to click away. I have not found any articles that explain those concept. Does anyone have any tips and/or recourses where i can get some articles that guide you in making the correct choices for your website.

Read the article
vectorizing a for loop in numpy/scipy?

- by user248237

I'm trying to vectorize a for loop that I have inside of a class method. The for loop has the following form: it iterates through a bunch of points and depending on whether a certain variable (called "self.condition_met" below) is true, calls a pair of functions on the point, and adds the result to a list. Each point here is an element in a vector of lists, i.e. a data structure that looks like array([[1,2,3], [4,5,6], ...]). Here is the problematic function: def myClass: def my_inefficient_method(self): final_vector = [] # Assume 'my_vector' and 'my_other_vector' are defined numpy arrays for point in all_points: if not self.condition_met: a = self.my_func1(point, my_vector) b = self.my_func2(point, my_other_vector) else: a = self.my_func3(point, my_vector) b = self.my_func4(point, my_other_vector) c = a + b final_vector.append(c) # Choose random element from resulting vector 'final_vector' self.condition_met is set before my_inefficient_method is called, so it seems unnecessary to check it each time, but I am not sure how to better write this. Since there are no destructive operations here it is seems like I could rewrite this entire thing as a vectorized operation -- is that possible? any ideas how to do this?

Read the article
How to optimize my PageRank calculation?

- by asmaier

In the book Programming Collective Intelligence I found the following function to compute the PageRank: def calculatepagerank(self,iterations=20): # clear out the current PageRank tables self.con.execute("drop table if exists pagerank") self.con.execute("create table pagerank(urlid primary key,score)") self.con.execute("create index prankidx on pagerank(urlid)") # initialize every url with a PageRank of 1.0 self.con.execute("insert into pagerank select rowid,1.0 from urllist") self.dbcommit() for i in range(iterations): print "Iteration %d" % i for (urlid,) in self.con.execute("select rowid from urllist"): pr=0.15 # Loop through all the pages that link to this one for (linker,) in self.con.execute("select distinct fromid from link where toid=%d" % urlid): # Get the PageRank of the linker linkingpr=self.con.execute("select score from pagerank where urlid=%d" % linker).fetchone()[0] # Get the total number of links from the linker linkingcount=self.con.execute("select count(*) from link where fromid=%d" % linker).fetchone()[0] pr+=0.85*(linkingpr/linkingcount) self.con.execute("update pagerank set score=%f where urlid=%d" % (pr,urlid)) self.dbcommit() However, this function is very slow, because of all the SQL queries in every iteration >>> import cProfile >>> cProfile.run("crawler.calculatepagerank()") 2262510 function calls in 136.006 CPU seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 136.006 136.006 <string>:1(<module>) 1 20.826 20.826 136.006 136.006 searchengine.py:179(calculatepagerank) 21 0.000 0.000 0.528 0.025 searchengine.py:27(dbcommit) 21 0.528 0.025 0.528 0.025 {method 'commit' of 'sqlite3.Connecti 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler 1339864 112.602 0.000 112.602 0.000 {method 'execute' of 'sqlite3.Connec 922600 2.050 0.000 2.050 0.000 {method 'fetchone' of 'sqlite3.Cursor' 1 0.000 0.000 0.000 0.000 {range} So I optimized the function and came up with this: def calculatepagerank2(self,iterations=20): # clear out the current PageRank tables self.con.execute("drop table if exists pagerank") self.con.execute("create table pagerank(urlid primary key,score)") self.con.execute("create index prankidx on pagerank(urlid)") # initialize every url with a PageRank of 1.0 self.con.execute("insert into pagerank select rowid,1.0 from urllist") self.dbcommit() inlinks={} numoutlinks={} pagerank={} for (urlid,) in self.con.execute("select rowid from urllist"): inlinks[urlid]=[] numoutlinks[urlid]=0 # Initialize pagerank vector with 1.0 pagerank[urlid]=1.0 # Loop through all the pages that link to this one for (inlink,) in self.con.execute("select distinct fromid from link where toid=%d" % urlid): inlinks[urlid].append(inlink) # get number of outgoing links from a page numoutlinks[urlid]=self.con.execute("select count(*) from link where fromid=%d" % urlid).fetchone()[0] for i in range(iterations): print "Iteration %d" % i for urlid in pagerank: pr=0.15 for link in inlinks[urlid]: linkpr=pagerank[link] linkcount=numoutlinks[link] pr+=0.85*(linkpr/linkcount) pagerank[urlid]=pr for urlid in pagerank: self.con.execute("update pagerank set score=%f where urlid=%d" % (pagerank[urlid],urlid)) self.dbcommit() This function is 20 times faster (but uses a lot more memory for all the temporary dictionaries) because it avoids the unnecessary SQL queries in every iteration: >>> cProfile.run("crawler.calculatepagerank2()") 64802 function calls in 6.950 CPU seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.004 0.004 6.950 6.950 <string>:1(<module>) 1 1.004 1.004 6.946 6.946 searchengine.py:207(calculatepagerank2 2 0.000 0.000 0.104 0.052 searchengine.py:27(dbcommit) 23065 0.012 0.000 0.012 0.000 {meth 'append' of 'list' objects} 2 0.104 0.052 0.104 0.052 {meth 'commit' of 'sqlite3.Connection 1 0.000 0.000 0.000 0.000 {meth 'disable' of '_lsprof.Profiler' 31298 5.809 0.000 5.809 0.000 {meth 'execute' of 'sqlite3.Connectio 10431 0.018 0.000 0.018 0.000 {method 'fetchone' of 'sqlite3.Cursor' 1 0.000 0.000 0.000 0.000 {range} But is it possible to further reduce the number of SQL queries to speed up the function even more?

Read the article
SQLite3-ruby extremely slow under 1.9.1?

- by NilObject

I decided to upgrade my server to Ruby 1.9.1, and a lot of things are indeed much faster. However, I have a process that dumps a database to sqlite, and it's become glacially slow. What used to take 30 seconds now takes upwards of 10 minutes. The code does several create table statements, and then lots of inserts. The insert statements nearly all use placeholders (?), so SQLite is doing the heavy lifting of binding the parameters. In short, I can't see why this particular usage has slowed down so much. Does anyone know of any problems that have caused it? I'm using sqlite3-ruby (1.2.5), and I'm hoping that someone has encountered this and profiled it. If not, I guess I'm going to learn how to profile ruby code :)

Read the article
Prevent full table scan for query with multiple where clauses

- by Dave Jarvis

A while ago I posted a message about optimizing a query in MySQL. I have since ported the data and query to PostgreSQL, but now PostgreSQL has the same problem. The solution in MySQL was to force the optimizer to not optimize using STRAIGHT_JOIN. PostgreSQL offers no such option. Here is the explain: Here is the query: SELECT avg(d.amount) AS amount, y.year FROM station s, station_district sd, year_ref y, month_ref m, daily d LEFT JOIN city c ON c.id = 10663 WHERE -- Find all the stations within a specific unit radius ... -- 6371.009 * SQRT( POW(RADIANS(c.latitude_decimal - s.latitude_decimal), 2) + (COS(RADIANS(c.latitude_decimal + s.latitude_decimal) / 2) * POW(RADIANS(c.longitude_decimal - s.longitude_decimal), 2)) ) <= 50 AND -- Ignore stations outside the given elevations -- s.elevation BETWEEN 0 AND 2000 AND sd.id = s.station_district_id AND -- Gather all known years for that station ... -- y.station_district_id = sd.id AND -- The data before 1900 is shaky; insufficient after 2009. -- y.year BETWEEN 1980 AND 2000 AND -- Filtered by all known months ... -- m.year_ref_id = y.id AND m.month = 12 AND -- Whittled down by category ... -- m.category_id = '001' AND -- Into the valid daily climate data. -- m.id = d.month_ref_id AND d.daily_flag_id <> 'M' GROUP BY y.year It appears as though PostgreSQL is looking at the DAILY table first, which is simply not the right way to go about this query as there are nearly 300 million rows. How do I force PostgreSQL to start at the CITY table? Thank you!

Read the article
Lots of mysql Sleep processes

- by user259284

Hello, I am still having trouble with my mysql server. It seems that since i optimize it, the tables were growing and now sometimes is very slow again. I have no idea of how to optimize more. mySQL server has 48GB of RAM and mysqld is using about 8, most of the tables are innoDB. Site has about 2000 users online. I also run explain on every query and every one of them is indexed. mySQL processes: http://www.pik.ba/mysqlStanje.php my.cnf: # The MySQL database server configuration file. # # You can copy this to one of: # - "/etc/mysql/my.cnf" to set global options, # - "~/.my.cnf" to set user-specific options. # # One can use all long options that the program supports. # Run program with --help to get a list of available options and with # --print-defaults to see which it would actually understand and use. # # For explanations see # http://dev.mysql.com/doc/mysql/en/server-system-variables.html # This will be passed to all mysql clients # It has been reported that passwords should be enclosed with ticks/quotes # escpecially if they contain "#" chars... # Remember to edit /etc/mysql/debian.cnf when changing the socket location. [client] port = 3306 socket = /var/run/mysqld/mysqld.sock # Here is entries for some specific programs # The following values assume you have at least 32M ram # This was formally known as [safe_mysqld]. Both versions are currently parsed. [mysqld_safe] socket = /var/run/mysqld/mysqld.sock nice = 0 [mysqld] # # * Basic Settings # user = mysql pid-file = /var/run/mysqld/mysqld.pid socket = /var/run/mysqld/mysqld.sock port = 3306 basedir = /usr datadir = /var/lib/mysql tmpdir = /tmp language = /usr/share/mysql/english skip-external-locking # # Instead of skip-networking the default is now to listen only on # localhost which is more compatible and is not less secure. bind-address = 10.100.27.30 # # * Fine Tuning # key_buffer = 64M key_buffer_size = 512M max_allowed_packet = 16M thread_stack = 128K thread_cache_size = 8 # This replaces the startup script and checks MyISAM tables if needed # the first time they are touched myisam-recover = BACKUP max_connections = 1000 table_cache = 1000 join_buffer_size = 2M tmp_table_size = 2G max_heap_table_size = 2G innodb_buffer_pool_size = 3G innodb_additional_mem_pool_size = 128M innodb_log_file_size = 100M log-slow-queries = /var/log/mysql/slow.log sort_buffer_size = 5M net_buffer_length = 5M read_buffer_size = 2M read_rnd_buffer_size = 12M thread_concurrency = 10 ft_min_word_len = 3 #thread_concurrency = 10 # # * Query Cache Configuration # query_cache_limit = 1M query_cache_size = 512M # # * Logging and Replication # # Both location gets rotated by the cronjob. # Be aware that this log type is a performance killer. #log = /var/log/mysql/mysql.log # # Error logging goes to syslog. This is a Debian improvement :) # # Here you can see queries with especially long duration #log_slow_queries = /var/log/mysql/mysql-slow.log #long_query_time = 2 #log-queries-not-using-indexes # # The following can be used as easy to replay backup logs or for replication. # note: if you are setting up a replication slave, see README.Debian about # other settings you may need to change. #server-id = 1 #log_bin = /var/log/mysql/mysql-bin.log expire_logs_days = 10 max_binlog_size = 100M #binlog_do_db = include_database_name #binlog_ignore_db = include_database_name # # * BerkeleyDB # # Using BerkeleyDB is now discouraged as its support will cease in 5.1.12. skip-bdb # # * InnoDB # # InnoDB is enabled by default with a 10MB datafile in /var/lib/mysql/. # Read the manual for more InnoDB related options. There are many! # You might want to disable InnoDB to shrink the mysqld process by circa 100MB. #skip-innodb # # * Security Features # # Read the manual, too, if you want chroot! # chroot = /var/lib/mysql/ # # For generating SSL certificates I recommend the OpenSSL GUI "tinyca". # # ssl-ca=/etc/mysql/cacert.pem # ssl-cert=/etc/mysql/server-cert.pem # ssl-key=/etc/mysql/server-key.pem [mysqldump] quick quote-names max_allowed_packet = 16M [mysql] #no-auto-rehash # faster start of mysql but no tab completition [isamchk] key_buffer = 16M # # * NDB Cluster # # See /usr/share/doc/mysql-server-*/README.Debian for more information. # # The following configuration is read by the NDB Data Nodes (ndbd processes) # not from the NDB Management Nodes (ndb_mgmd processes). # # [MYSQL_CLUSTER] # ndb-connectstring=127.0.0.1 # # * IMPORTANT: Additional settings that can override those from this file! # The files must end with '.cnf', otherwise they'll be ignored. # !includedir /etc/mysql/conf.d/

Read the article
Speed improvements for Perl's chameneos-redux in the Computer Language Benchmarks Game

- by Robert P

Ever looked at the Computer Language Benchmarks Game (formerly known as the Great Language Shootout)? Perl has some pretty healthy competition there at the moment. It also occurs to me that there's probably some places that Perl's scores could be improved. The biggest one is in the chameneos-redux script right now—the Perl version runs the worst out of any language: 1,626 times slower than the C baseline solution! There are some restrictions on how the programs can be made and optimized, and there is Perl's interpreted runtime penalty, but 1,626 times? There's got to be something that can get the runtime of this program way down. Taking a look at the source code and the challenge, how can the speed be improved?

Read the article
Optimizing Levenshtein Distance Algorithm

- by Matt

I have a stored procedure that uses Levenshtein Distance to determine the result closest to what the user typed. The only thing really affecting the speed is the function that calculates the Levenshtein Distance for all the records before selecting the record with the lowest distance (I've verified this by putting a 0 in place of the call to the Levenshtein function). The table has 1.5 million records, so even the slightest adjustment may shave off a few seconds. Right now the entire thing runs over 10 minutes. Here's the method I'm using: ALTER function dbo.Levenshtein ( @Source nvarchar(200), @Target nvarchar(200) ) RETURNS int AS BEGIN DECLARE @Source_len int, @Target_len int, @i int, @j int, @Source_char nchar, @Dist int, @Dist_temp int, @Distv0 varbinary(8000), @Distv1 varbinary(8000) SELECT @Source_len = LEN(@Source), @Target_len = LEN(@Target), @Distv1 = 0x0000, @j = 1, @i = 1, @Dist = 0 WHILE @j <= @Target_len BEGIN SELECT @Distv1 = @Distv1 + CAST(@j AS binary(2)), @j = @j + 1 END WHILE @i <= @Source_len BEGIN SELECT @Source_char = SUBSTRING(@Source, @i, 1), @Dist = @i, @Distv0 = CAST(@i AS binary(2)), @j = 1 WHILE @j <= @Target_len BEGIN SET @Dist = @Dist + 1 SET @Dist_temp = CAST(SUBSTRING(@Distv1, @j+@j-1, 2) AS int) + CASE WHEN @Source_char = SUBSTRING(@Target, @j, 1) THEN 0 ELSE 1 END IF @Dist > @Dist_temp BEGIN SET @Dist = @Dist_temp END SET @Dist_temp = CAST(SUBSTRING(@Distv1, @j+@j+1, 2) AS int)+1 IF @Dist > @Dist_temp SET @Dist = @Dist_temp BEGIN SELECT @Distv0 = @Distv0 + CAST(@Dist AS binary(2)), @j = @j + 1 END END SELECT @Distv1 = @Distv0, @i = @i + 1 END RETURN @Dist END Anyone have any ideas? Any input is appreciated. Thanks, Matt

Read the article
MySQL Query performance - huge difference in time

- by Damo

I have a query that is returning in vastly different amounts of time between 2 datasets. For one set (database A) it returns in a few seconds, for the other (database B)....well I haven't waited long enough yet, but over 10 minutes. I have dumped both of these databases to my local machine where I can reproduce the issue running MySQL 5.1.37. Curiously, database B is smaller than database A. A stripped down version of the query that reproduces the problem is: SELECT * FROM po_shipment ps JOIN po_shipment_item psi USING (ship_id) JOIN po_alloc pa ON ps.ship_id = pa.ship_id AND pa.UID_items = psi.UID_items JOIN po_header ph ON pa.hdr_id = ph.hdr_id LEFT JOIN EVENT_TABLE ev0 ON ev0.TABLE_ID1 = ps.ship_id AND ev0.EVENT_TYPE = 'MAS0' LEFT JOIN EVENT_TABLE ev1 ON ev1.TABLE_ID1 = ps.ship_id AND ev1.EVENT_TYPE = 'MAS1' LEFT JOIN EVENT_TABLE ev2 ON ev2.TABLE_ID1 = ps.ship_id AND ev2.EVENT_TYPE = 'MAS2' LEFT JOIN EVENT_TABLE ev3 ON ev3.TABLE_ID1 = ps.ship_id AND ev3.EVENT_TYPE = 'MAS3' LEFT JOIN EVENT_TABLE ev4 ON ev4.TABLE_ID1 = ps.ship_id AND ev4.EVENT_TYPE = 'MAS4' LEFT JOIN EVENT_TABLE ev5 ON ev5.TABLE_ID1 = ps.ship_id AND ev5.EVENT_TYPE = 'MAS5' WHERE ps.eta >= '2010-03-22' GROUP BY ps.ship_id LIMIT 100; The EXPLAIN query plan for the first database (A) that returns in ~2 seconds is: +----+-------------+-------+--------+----------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+---------+------------------------------+------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+--------+----------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+---------+------------------------------+------+----------------------------------------------+ | 1 | SIMPLE | ps | range | PRIMARY,IX_ETA_DATE | IX_ETA_DATE | 4 | NULL | 174 | Using where; Using temporary; Using filesort | | 1 | SIMPLE | ev0 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_PROD.ps.ship_id,const | 1 | | | 1 | SIMPLE | ev1 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_PROD.ps.ship_id,const | 1 | | | 1 | SIMPLE | ev2 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_PROD.ps.ship_id,const | 1 | | | 1 | SIMPLE | ev3 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_PROD.ps.ship_id,const | 1 | | | 1 | SIMPLE | ev4 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_PROD.ps.ship_id,const | 1 | | | 1 | SIMPLE | ev5 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_PROD.ps.ship_id,const | 1 | | | 1 | SIMPLE | psi | ref | PRIMARY,IX_po_shipment_item_po_shipment1,FK_po_shipment_item_po_shipment1 | IX_po_shipment_item_po_shipment1 | 4 | UNIVIS_PROD.ps.ship_id | 1 | | | 1 | SIMPLE | pa | ref | IX_po_alloc_po_shipment_item2,IX_po_alloc_po_details_old,FK_po_alloc_po_shipment1,FK_po_alloc_po_shipment_item1,FK_po_alloc_po_header1 | FK_po_alloc_po_shipment1 | 4 | UNIVIS_PROD.psi.ship_id | 5 | Using where | | 1 | SIMPLE | ph | eq_ref | PRIMARY,IX_HDR_ID | PRIMARY | 4 | UNIVIS_PROD.pa.hdr_id | 1 | | +----+-------------+-------+--------+----------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+---------+------------------------------+------+----------------------------------------------+ The EXPLAIN query plan for the second database (B) that returns in 600 seconds is: +----+-------------+-------+--------+----------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+---------+--------------------------------+------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+--------+----------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+---------+--------------------------------+------+----------------------------------------------+ | 1 | SIMPLE | ps | range | PRIMARY,IX_ETA_DATE | IX_ETA_DATE | 4 | NULL | 38 | Using where; Using temporary; Using filesort | | 1 | SIMPLE | psi | ref | PRIMARY,IX_po_shipment_item_po_shipment1,FK_po_shipment_item_po_shipment1 | IX_po_shipment_item_po_shipment1 | 4 | UNIVIS_DEV01.ps.ship_id | 1 | | | 1 | SIMPLE | ev0 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_DEV01.psi.ship_id,const | 1 | | | 1 | SIMPLE | ev1 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_DEV01.psi.ship_id,const | 1 | | | 1 | SIMPLE | ev2 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_DEV01.ps.ship_id,const | 1 | | | 1 | SIMPLE | ev3 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_DEV01.psi.ship_id,const | 1 | | | 1 | SIMPLE | ev4 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_DEV01.psi.ship_id,const | 1 | | | 1 | SIMPLE | ev5 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_DEV01.ps.ship_id,const | 1 | | | 1 | SIMPLE | pa | ref | IX_po_alloc_po_shipment_item2,IX_po_alloc_po_details_old,FK_po_alloc_po_shipment1,FK_po_alloc_po_shipment_item1,FK_po_alloc_po_header1 | IX_po_alloc_po_shipment_item2 | 4 | UNIVIS_DEV01.ps.ship_id | 4 | Using where | | 1 | SIMPLE | ph | eq_ref | PRIMARY,IX_HDR_ID | PRIMARY | 4 | UNIVIS_DEV01.pa.hdr_id | 1 | | +----+-------------+-------+--------+----------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+---------+--------------------------------+------+----------------------------------------------+ When database B is running I can look at the MySQL Administrator and the state remains at "Copying to tmp table" indefinitely. Database A also has this state but for only a second or so. There are no differences in the table structure, indexes, keys etc between these databases (I have done show create tables and diff'd them). The sizes of the tables are: database A: po_shipment 1776 po_shipment_item 1945 po_alloc 36298 po_header 71642 EVENT_TABLE 1608 database B: po_shipment 463 po_shipment_item 470 po_alloc 3291 po_header 56149 EVENT_TABLE 1089 Some points to note: Removing the WHERE clause makes the query return < 1 sec. Removing the GROUP BY makes the query return < 1 sec. Removing ev5, ev4, ev3 etc makes the query get faster for each one removed. Can anyone suggest how to resolve this issue? What have I missed? Many Thanks.

Read the article
Fastest image iteration in Python

- by Greg

I am creating a simple green screen app with Python 2.7.4 but am getting quite slow results. I am currently using PIL 1.1.7 to load and iterate the images and saw huge speed-ups changing from the old getpixel() to the newer load() and pixel access object indexing. However the following loop still takes around 2.5 seconds to run for an image of around 720p resolution: def colorclose(Cb_p, Cr_p, Cb_key, Cr_key, tola, tolb): temp = math.sqrt((Cb_key-Cb_p)**2+(Cr_key-Cr_p)**2) if temp < tola: return 0.0 else: if temp < tolb: return (temp-tola)/(tolb-tola) else: return 1.0 .... for x in range(width): for y in range(height): Y, cb, cr = fg_cbcr_list[x, y] mask = colorclose(cb, cr, cb_key, cr_key, tola, tolb) mask = 1 - mask bgr, bgg, bgb = bg_list[x,y] fgr, fgg, fgb = fg_list[x,y] pixels[x,y] = ( (int)(fgr - mask*key_color[0] + mask*bgr), (int)(fgg - mask*key_color[1] + mask*bgg), (int)(fgb - mask*key_color[2] + mask*bgb)) Am I doing anything hugely inefficient here which makes it run so slow? I have seen similar, simpler examples where the loop is replaced by a boolean matrix for instance, but for this case I can't see a way to replace the loop. The pixels[x,y] assignment seems to take the most amount of time but not knowing Python very well I am unsure of a more efficient way to do this. Any help would be appreciated.

Read the article
How to optimize this MySQL query

- by James Simpson

This query was working fine when the database was small, but now that there are millions of rows in the database, I am realizing I should have looked at optimizing this earlier. It is looking at over 600,000 rows and is Using where; Using temporary; Using filesort (which leads to an execution time of 5-10 seconds). It is using an index on the field 'battle_type.' SELECT username, SUM( outcome ) AS wins, COUNT( * ) - SUM( outcome ) AS losses FROM tblBattleHistory WHERE battle_type = '0' && outcome < '2' GROUP BY username ORDER BY wins DESC , losses ASC , username ASC LIMIT 0 , 50

Read the article
C# File IO with Streams - Best Memory Buffer Size

- by AJ

Hi, I am writing a small IO library to assist with a larger (hobby) project. A part of this library performs various functions on a file, which is read / written via the FileStream object. On each StreamReader.Read(...) pass, I fire off an event which will be used in the main app to display progress information. The processing that goes on in the loop is vaired, but is not too time consuming (it could just be a simple file copy, for example, or may involve encryption...). My main question is: What is the best memory buffer size to use? Thinking about physical disk layouts, I could pick 2k, which would cover a CD sector size and is a nice multiple of a 512 byte hard disk sector. Higher up the abstraction tree, you could go for a larger buffer which could read an entire FAT cluster at a time. I realise with today's PC's, I could go for a more memory hungry option (a couple of MiB, for example), but then I increase the time between UI updates and the user perceives a less responsive app. As an aside, I'm eventually hoping to provide a similar interface to files hosted on FTP / HTTP servers (over a local network / fastish DSL). What would be the best memory buffer size for those (again, a "best-case" tradeoff between perceived responsiveness vs. performance). Thanks in advance for any ideas, Adam

Read the article
MySQL queries - how expensive are they really?

- by incrediman

I've heard that mysql queries are very expensive, and that you should avoid at all costs making too many of them. I'm developing a site that will be used by quite a few people, and I'm wondering: How expensive are mysql queries actually? If I have 400,000 people in my database, how expensive is it to query it for one of them? How close attention do I need to pay that I don't make too many queries per client request?

Read the article
Ever any performance different between Java >> and >>> right shift operators?

- by Sean Owen

Is there ever reason to think the (signed) and (unsigned) right bit-shift operators in Java would perform differently? I can't detect any difference on my machine. This is purely an academic question; it's never going to be the bottleneck I'm sure. I know: it's best to write what you mean foremost; use for division by 2, for example. I assume it comes down to which architectures have which operations implemented as an instruction.

Read the article
Database design in blogging systems

- by Peter

As a learning exercise I'm trying to put myself a blogging system. The goal is to code something that will let me create multiple blogs, like blogger.com or wordpress.com, but much simplified. I would like to ask you, what do you think is best database design for this type of script. Is it better to have one big table, containing posts from all blogs of all users (like friendfeed) or would it be better to create separate table for each blog's posts? Big thanks in advance for your help, Peter.

Read the article
Speed up bitstring/bit operations in Python?

- by Xavier Ho

I wrote a prime number generator using Sieve of Eratosthenes and Python 3.1. The code runs correctly and gracefully at 0.32 seconds on ideone.com to generate prime numbers up to 1,000,000. # from bitstring import BitString def prime_numbers(limit=1000000): '''Prime number generator. Yields the series 2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ... using Sieve of Eratosthenes. ''' yield 2 sub_limit = int(limit**0.5) flags = [False, False] + [True] * (limit - 2) # flags = BitString(limit) # Step through all the odd numbers for i in range(3, limit, 2): if flags[i] is False: # if flags[i] is True: continue yield i # Exclude further multiples of the current prime number if i <= sub_limit: for j in range(i*3, limit, i<<1): flags[j] = False # flags[j] = True The problem is, I run out of memory when I try to generate numbers up to 1,000,000,000. flags = [False, False] + [True] * (limit - 2) MemoryError As you can imagine, allocating 1 billion boolean values (1 byte 4 or 8 bytes (see comment) each in Python) is really not feasible, so I looked into bitstring. I figured, using 1 bit for each flag would be much more memory-efficient. However, the program's performance dropped drastically - 24 seconds runtime, for prime number up to 1,000,000. This is probably due to the internal implementation of bitstring. You can comment/uncomment the three lines to see what I changed to use BitString, as the code snippet above. My question is, is there a way to speed up my program, with or without bitstring?

Read the article
3 dimensional bin packing algorithms

- by BuschnicK

I'm faced with a 3 dimensional bin packing problem and am currently conducting some preliminary research as to which algorithms/heuristics are currently yielding the best results. Since the problem is NP hard I do not expect to find the optimal solution in every case, but I was wondering: 1) what are the best exact solvers? Branch and Bound? What problem instance sizes can I expect to solve with reasonable computing resources? 2) what are the best heuristic solvers? 3) What off-the-shelf solutions exist to conduct some experiments with?

Read the article
What is the most platform- and Python-version-independent way to make a fast loop for use in Python?

- by Statto

I'm writing a scientific application in Python with a very processor-intensive loop at its core. I would like to optimise this as far as possible, at minimum inconvenience to end users, who will probably use it as an uncompiled collection of Python scripts, and will be using Windows, Mac, and (mainly Ubuntu) Linux. It is currently written in Python with a dash of NumPy, and I've included the code below. Is there a solution which would be reasonably fast which would not require compilation? This would seem to be the easiest way to maintain platform-independence. If using something like Pyrex, which does require compilation, is there an easy way to bundle many modules and have Python choose between them depending on detected OS and Python version? Is there an easy way to build the collection of modules without needing access to every system with every version of Python? Does one method lend itself particularly to multi-processor optimisation? (If you're interested, the loop is to calculate the magnetic field at a given point inside a crystal by adding together the contributions of a large number of nearby magnetic ions, treated as tiny bar magnets. Basically, a massive sum of these.) # calculate_dipole # ------------------------- # calculate_dipole works out the dipole field at a given point within the crystal unit cell # --- # INPUT # mu = position at which to calculate the dipole field # r_i = array of atomic positions # mom_i = corresponding array of magnetic moments # --- # OUTPUT # B = the B-field at this point def calculate_dipole(mu, r_i, mom_i): relative = mu - r_i r_unit = unit_vectors(relative) #4pi / mu0 (at the front of the dipole eqn) A = 1e-7 #initalise dipole field B = zeros(3,float) for i in range(len(relative)): #work out the dipole field and add it to the estimate so far B += A*(3*dot(mom_i[i],r_unit[i])*r_unit[i] - mom_i[i]) / sqrt(dot(relative[i],relative[i]))**3 return B

Read the article
Mysql - Help me change this single complex query to use temporary tables

- by sandeepan-nath

About the system: - There are tutors who create classes and packs - A tags based search approach is being followed.Tag relations are created when new tutors register and when tutors create packs (this makes tutors and packs searcheable). For details please check the section How tags work in this system? below. Following is the concerned query Can anybody help me suggest an approach using temporary tables. We have indexed all the relevant fields and it looks like this is the least time possible with this approach:- SELECT SUM(DISTINCT( t.tag LIKE "%Dictatorship%" OR tt.tag LIKE "%Dictatorship%" OR ttt.tag LIKE "%Dictatorship%" )) AS key_1_total_matches , SUM(DISTINCT( t.tag LIKE "%democracy%" OR tt.tag LIKE "%democracy%" OR ttt.tag LIKE "%democracy%" )) AS key_2_total_matches , COUNT(DISTINCT( od.id_od )) AS tutor_popularity, CASE WHEN ( IF(( wc.id_wc > 0 ), ( wc.wc_api_status = 1 AND wc.wc_type = 0 AND wc.class_date > '2010-06-01 22:00:56' AND wccp.status = 1 AND ( wccp.country_code = 'IE' OR wccp.country_code IN ( 'INT' ) ) ), 0) ) THEN 1 ELSE 0 END AS 'classes_published' , CASE WHEN ( IF(( lp.id_lp > 0 ), ( lp.id_status = 1 AND lp.published = 1 AND lpcp.status = 1 AND ( lpcp.country_code = 'IE' OR lpcp.country_code IN ( 'INT' ) ) ), 0) ) THEN 1 ELSE 0 END AS 'packs_published', td . *, u . * FROM tutor_details AS td JOIN users AS u ON u.id_user = td.id_user LEFT JOIN learning_packs_tag_relations AS lptagrels ON td.id_tutor = lptagrels.id_tutor LEFT JOIN learning_packs AS lp ON lptagrels.id_lp = lp.id_lp LEFT JOIN learning_packs_categories AS lpc ON lpc.id_lp_cat = lp.id_lp_cat LEFT JOIN learning_packs_categories AS lpcp ON lpcp.id_lp_cat = lpc.id_parent LEFT JOIN learning_pack_content AS lpct ON ( lp.id_lp = lpct.id_lp ) LEFT JOIN webclasses_tag_relations AS wtagrels ON td.id_tutor = wtagrels.id_tutor LEFT JOIN webclasses AS wc ON wtagrels.id_wc = wc.id_wc LEFT JOIN learning_packs_categories AS wcc ON wcc.id_lp_cat = wc.id_wp_cat LEFT JOIN learning_packs_categories AS wccp ON wccp.id_lp_cat = wcc.id_parent LEFT JOIN order_details AS od ON td.id_tutor = od.id_author LEFT JOIN orders AS o ON od.id_order = o.id_order LEFT JOIN tutors_tag_relations AS ttagrels ON td.id_tutor = ttagrels.id_tutor LEFT JOIN tags AS t ON t.id_tag = ttagrels.id_tag LEFT JOIN tags AS tt ON tt.id_tag = lptagrels.id_tag LEFT JOIN tags AS ttt ON ttt.id_tag = wtagrels.id_tag WHERE ( u.country = 'IE' OR u.country IN ( 'INT' ) ) AND CASE WHEN ( ( tt.id_tag = lptagrels.id_tag ) AND ( lp.id_lp > 0 ) ) THEN lp.id_status = 1 AND lp.published = 1 AND lpcp.status = 1 AND ( lpcp.country_code = 'IE' OR lpcp.country_code IN ( 'INT' ) ) ELSE 1 END AND CASE WHEN ( ( ttt.id_tag = wtagrels.id_tag ) AND ( wc.id_wc > 0 ) ) THEN wc.wc_api_status = 1 AND wc.wc_type = 0 AND wc.class_date > '2010-06-01 22:00:56' AND wccp.status = 1 AND ( wccp.country_code = 'IE' OR wccp.country_code IN ( 'INT' ) ) ELSE 1 END AND CASE WHEN ( od.id_od > 0 ) THEN od.id_author = td.id_tutor AND o.order_status = 'paid' AND CASE WHEN ( od.id_wc > 0 ) THEN od.can_attend_class = 1 ELSE 1 END ELSE 1 END AND ( t.tag LIKE "%Dictatorship%" OR t.tag LIKE "%democracy%" OR tt.tag LIKE "%Dictatorship%" OR tt.tag LIKE "%democracy%" OR ttt.tag LIKE "%Dictatorship%" OR ttt.tag LIKE "%democracy%" ) GROUP BY td.id_tutor HAVING key_1_total_matches = 1 AND key_2_total_matches = 1 ORDER BY tutor_popularity DESC, u.surname ASC, u.name ASC LIMIT 0, 20 The problem The results returned by the above query are correct (AND logic working as per expectation), but the time taken by the query rises alarmingly for heavier data and for the current data I have it is like 10 seconds as against normal query timings of the order of 0.005 - 0.0002 seconds, which makes it totally unusable. Somebody suggested in my previous question to do the following:- create a temporary table and insert here all relevant data that might end up in the final result set run several updates on this table, joining the required tables one at a time instead of all of them at the same time finally perform a query on this temporary table to extract the end result All this was done in a stored procedure, the end result has passed unit tests, and is blazing fast. I have never worked with temporary tables till now. Only if I could get some hints, kind of schematic representations so that I can start with... Is there something faulty with the query? What can be the reason behind 10+ seconds of execution time? How tags work in this system? When a tutor registers, tags are entered and tag relations are created with respect to tutor's details like name, surname etc. When a Tutors create packs, again tags are entered and tag relations are created with respect to pack's details like pack name, description etc. tag relations for tutors stored in tutors_tag_relations and those for packs stored in learning_packs_tag_relations. All individual tags are stored in tags table. The explain query output:- Please see this screenshot - http://www.test.examvillage.com/Explain_query_improved.jpg

Read the article
text indexes vs integer indexes in mysql

- by imanc

Hey, I have always tried to have an integer primary key on a table no matter what. But now I am questioning if this is always necessary. Let's say I have a product table and each product has a globally unique SKU number - that would be a string of say 8-16 characters. Why not make this the PK? Typically I would make this field a unique index but then have an auto incrementing int field as the PK, as I assumed it would be faster, easier to maintain, and would allow me to do things like get the last 5 records added with ease. But in terms of optimisation, assuming I'd only ever be matching the full text field and next doing text matching queries (e.g. like %%) can you guys think of any reasons not to use a text based primary key, most likely of type varchar()? Cheers, imanc

Read the article
C# Improvement on a Fire-and-Forget

- by adam

Greetings I have a program that creates multiples instances of a class, runs the same long-running Update method on all instances and waits for completion. I'm following Kev's approach from this question of adding the Update to ThreadPool.QueueUserWorkItem. In the main prog., I'm sleeping for a few minutes and checking a Boolean in the last child to see if done while(!child[child.Length-1].isFinished){ Thread.Sleep(...); } This solution is working the way I want, but is there a better way to do this? Both for the independent instances and checking if all work is done. Thanks

Read the article
Query Optimizing Request

- by mithilatw

I am very sorry if this question is structured in not a very helpful manner or the question itself is not a very good one! I need to update a MSSQL table call component every 10 minutes based on information from another table call materials_progress I have nearly 60000 records in component and more than 10000 records in materials_progress I wrote an update query to do the job, but it takes longer than 4 minutes to complete execution! Here is the query : UPDATE component SET stage_id = CASE WHEN t.required_quantity <= t.total_received THEN 27 WHEN t.total_ordered < t.total_received THEN 18 ELSE 18 END FROM ( SELECT mp.job_id, mp.line_no, mp.component, l.quantity AS line_quantity, CASE WHEN mp.component_name_id = 2 THEN l.quantity*2 ELSE l.quantity END AS required_quantity, SUM(ordered) AS total_ordered, SUM(received) AS total_received , c.component_id FROM line l LEFT JOIN component c ON c.line_id = l.line_id LEFT JOIN materials_progress mp ON l.job_id = mp.job_id AND l.line_no = mp.line_no AND c.component_name_id = mp.component_name_id WHERE mp.job_id IS NOT NULL AND (mp.cancelled IS NULL OR mp.cancelled = 0) AND (mp.manual_override IS NULL OR mp.manual_override = 0) AND c.stage_id = 18 GROUP BY mp.job_id, mp.line_no, mp.component, l.quantity, mp.component_name_id, component_id ) AS t WHERE component.component_id = t.component_id I am not going to explain the scenario as it too complex.. could somebody please please tell me what makes this query this much expensive and a way to get around it? Thank you very very much in advance!!!

Read the article
Linux PDF/Postscript Optimizing

- by Sheldon Ross

So I have a report system built using Java and iText. PDF templates are created using Scribus. The Java code merges the data into the document using iText. The files are then copied over to a NFS share, and a BASH script prints them. I use acroread to convert them to PS, then lpr the PS. The FOSS application pdftops is horribly inefficient. My main problem is that the PDF's generated using iText/Scribus are very large. And I've recently run into the problem where acroread pukes because it hits 4gb of mem usage on large (300+ pages) documents. (Adobe is painfully slow at updating stuff to 64 bit). Now I can use Adobe reader on Windows, and use the Create Print PDF option or whatever its called, and it greatly( 10x) reduces the size of the PDF(it removes alot of metadata about form fields and such it appears) and produces a PDF that is basically a Print image. My question is does anyone know of a good solution/program for doing something similiar on Linux. Ideally, it would optimize the PDF, reduce size, and reduce PS complexity so the printer could print faster as it takes about 15-20 seconds a page to print right now.

Read the article

< Previous Page | 45 46 47 48 49 50 51 52 53 54 55 56 | Next Page >