Search Results

Search found 3512 results on 141 pages for 'premature optimization'.

Page 49/141 | < Previous Page | 45 46 47 48 49 50 51 52 53 54 55 56  | Next Page >

  • Fastest way to remove non-numeric characters from a VARCHAR in SQL Server

    - by Dan Herbert
    I'm writing an import utility that is using phone numbers as a unique key within the import. I need to check that the phone number does not already exist in my DB. The problem is that phone numbers in the DB could have things like dashes and parenthesis and possibly other things. I wrote a function to remove these things, the problem is that it is slow and with thousands of records in my DB and thousands of records to import at once, this process can be unacceptably slow. I've already made the phone number column an index. I tried using the script from this post: http://stackoverflow.com/questions/52315/t-sql-trim-nbsp-and-other-non-alphanumeric-characters But that didn't speed it up any. Is there a faster way to remove non-numeric characters? Something that can perform well when 10,000 to 100,000 records have to be compared. Whatever is done needs to perform fast. Update Given what people responded with, I think I'm going to have to clean the fields before I run the import utility. To answer the question of what I'm writing the import utility in, it is a C# app. I'm comparing BIGINT to BIGINT now, with no need to alter DB data and I'm still taking a performance hit with a very small set of data (about 2000 records). Could comparing BIGINT to BIGINT be slowing things down? I've optimized the code side of my app as much as I can (removed regexes, removed unneccessary DB calls). Although I can't isolate SQL as the source of the problem anymore, I still feel like it is.

    Read the article

  • Tips on creating user interfaces and optimizing the user experience

    - by Saif Bechan
    I am currently working on a project where a lot of user interaction is going to take place. There is also a commercial side as people can buy certain items and services. In my opinion a good blend of user interface, speed and security is essential for these types of websites. It is fairly easy to use ajax and JavaScript nowadays to do almost everything, as there are a lot of libraries available such as jQuery and others. But this can have some performance and incompatibility issues. This can lead to users just going to the next website. The overall look of the website is important too. Where to place certain buttons, where to place certain types of articles such as faq and support. Where and how to display error messages so that the user sees them but are not bothering him. And an overall color scheme is important too. The basic question is: How to create an interface that triggers a user to buy/use your services I know psychology also plays a huge role in how users interact with your website. The color scheme for example is important. When the colors are irritating on a website you just want to click away. I have not found any articles that explain those concept. Does anyone have any tips and/or recourses where i can get some articles that guide you in making the correct choices for your website.

    Read the article

  • ILOG CPLEX: how to populate IloLPMatrix while using addGe to set up the model?

    - by downer
    I have a queatoin about IloLPMatrix and addGe. I was trying to follow the example of AdMIPex5.java to generate user defined cutting planes based on the solution to the LP relaxation. The difference is that eh initial MIP model is not read in from a mps file, but set up in the code using methods like addGe, addLe etc. I think this is why I ran into problems while copying the exampe to do the following. IloLPMatrix lp = (IloLPMatrix)cplex.LPMatrixIterator().next(); lp from the above line turns to be NULL. I am wondering 1. What is the relationship between IloLPMatrix and the addLe, addGe commands? I tried to addLPMatrix() to the model, and then used model.addGe methods. but the LPMatrix seems to be empty still. How do I populate the IloLPMatrix of the moel according to the value that I had set up using addGe and addLe. Is the a method to this easily, or do I have to set them up row by row myself? I was doing this to get the number of variables and their values by doing lp.getNumVars(). Is there other methods that I can use to get the number of variables and their values wihout doing these, since my system is set up by addLe, addGe etc? Thanks a lot for your help on this.

    Read the article

  • vectorizing a for loop in numpy/scipy?

    - by user248237
    I'm trying to vectorize a for loop that I have inside of a class method. The for loop has the following form: it iterates through a bunch of points and depending on whether a certain variable (called "self.condition_met" below) is true, calls a pair of functions on the point, and adds the result to a list. Each point here is an element in a vector of lists, i.e. a data structure that looks like array([[1,2,3], [4,5,6], ...]). Here is the problematic function: def myClass: def my_inefficient_method(self): final_vector = [] # Assume 'my_vector' and 'my_other_vector' are defined numpy arrays for point in all_points: if not self.condition_met: a = self.my_func1(point, my_vector) b = self.my_func2(point, my_other_vector) else: a = self.my_func3(point, my_vector) b = self.my_func4(point, my_other_vector) c = a + b final_vector.append(c) # Choose random element from resulting vector 'final_vector' self.condition_met is set before my_inefficient_method is called, so it seems unnecessary to check it each time, but I am not sure how to better write this. Since there are no destructive operations here it is seems like I could rewrite this entire thing as a vectorized operation -- is that possible? any ideas how to do this?

    Read the article

  • SQLite3-ruby extremely slow under 1.9.1?

    - by NilObject
    I decided to upgrade my server to Ruby 1.9.1, and a lot of things are indeed much faster. However, I have a process that dumps a database to sqlite, and it's become glacially slow. What used to take 30 seconds now takes upwards of 10 minutes. The code does several create table statements, and then lots of inserts. The insert statements nearly all use placeholders (?), so SQLite is doing the heavy lifting of binding the parameters. In short, I can't see why this particular usage has slowed down so much. Does anyone know of any problems that have caused it? I'm using sqlite3-ruby (1.2.5), and I'm hoping that someone has encountered this and profiled it. If not, I guess I'm going to learn how to profile ruby code :)

    Read the article

  • How to optimize my PageRank calculation?

    - by asmaier
    In the book Programming Collective Intelligence I found the following function to compute the PageRank: def calculatepagerank(self,iterations=20): # clear out the current PageRank tables self.con.execute("drop table if exists pagerank") self.con.execute("create table pagerank(urlid primary key,score)") self.con.execute("create index prankidx on pagerank(urlid)") # initialize every url with a PageRank of 1.0 self.con.execute("insert into pagerank select rowid,1.0 from urllist") self.dbcommit() for i in range(iterations): print "Iteration %d" % i for (urlid,) in self.con.execute("select rowid from urllist"): pr=0.15 # Loop through all the pages that link to this one for (linker,) in self.con.execute("select distinct fromid from link where toid=%d" % urlid): # Get the PageRank of the linker linkingpr=self.con.execute("select score from pagerank where urlid=%d" % linker).fetchone()[0] # Get the total number of links from the linker linkingcount=self.con.execute("select count(*) from link where fromid=%d" % linker).fetchone()[0] pr+=0.85*(linkingpr/linkingcount) self.con.execute("update pagerank set score=%f where urlid=%d" % (pr,urlid)) self.dbcommit() However, this function is very slow, because of all the SQL queries in every iteration >>> import cProfile >>> cProfile.run("crawler.calculatepagerank()") 2262510 function calls in 136.006 CPU seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 136.006 136.006 <string>:1(<module>) 1 20.826 20.826 136.006 136.006 searchengine.py:179(calculatepagerank) 21 0.000 0.000 0.528 0.025 searchengine.py:27(dbcommit) 21 0.528 0.025 0.528 0.025 {method 'commit' of 'sqlite3.Connecti 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler 1339864 112.602 0.000 112.602 0.000 {method 'execute' of 'sqlite3.Connec 922600 2.050 0.000 2.050 0.000 {method 'fetchone' of 'sqlite3.Cursor' 1 0.000 0.000 0.000 0.000 {range} So I optimized the function and came up with this: def calculatepagerank2(self,iterations=20): # clear out the current PageRank tables self.con.execute("drop table if exists pagerank") self.con.execute("create table pagerank(urlid primary key,score)") self.con.execute("create index prankidx on pagerank(urlid)") # initialize every url with a PageRank of 1.0 self.con.execute("insert into pagerank select rowid,1.0 from urllist") self.dbcommit() inlinks={} numoutlinks={} pagerank={} for (urlid,) in self.con.execute("select rowid from urllist"): inlinks[urlid]=[] numoutlinks[urlid]=0 # Initialize pagerank vector with 1.0 pagerank[urlid]=1.0 # Loop through all the pages that link to this one for (inlink,) in self.con.execute("select distinct fromid from link where toid=%d" % urlid): inlinks[urlid].append(inlink) # get number of outgoing links from a page numoutlinks[urlid]=self.con.execute("select count(*) from link where fromid=%d" % urlid).fetchone()[0] for i in range(iterations): print "Iteration %d" % i for urlid in pagerank: pr=0.15 for link in inlinks[urlid]: linkpr=pagerank[link] linkcount=numoutlinks[link] pr+=0.85*(linkpr/linkcount) pagerank[urlid]=pr for urlid in pagerank: self.con.execute("update pagerank set score=%f where urlid=%d" % (pagerank[urlid],urlid)) self.dbcommit() This function is 20 times faster (but uses a lot more memory for all the temporary dictionaries) because it avoids the unnecessary SQL queries in every iteration: >>> cProfile.run("crawler.calculatepagerank2()") 64802 function calls in 6.950 CPU seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.004 0.004 6.950 6.950 <string>:1(<module>) 1 1.004 1.004 6.946 6.946 searchengine.py:207(calculatepagerank2 2 0.000 0.000 0.104 0.052 searchengine.py:27(dbcommit) 23065 0.012 0.000 0.012 0.000 {meth 'append' of 'list' objects} 2 0.104 0.052 0.104 0.052 {meth 'commit' of 'sqlite3.Connection 1 0.000 0.000 0.000 0.000 {meth 'disable' of '_lsprof.Profiler' 31298 5.809 0.000 5.809 0.000 {meth 'execute' of 'sqlite3.Connectio 10431 0.018 0.000 0.018 0.000 {method 'fetchone' of 'sqlite3.Cursor' 1 0.000 0.000 0.000 0.000 {range} But is it possible to further reduce the number of SQL queries to speed up the function even more?

    Read the article

  • Lots of mysql Sleep processes

    - by user259284
    Hello, I am still having trouble with my mysql server. It seems that since i optimize it, the tables were growing and now sometimes is very slow again. I have no idea of how to optimize more. mySQL server has 48GB of RAM and mysqld is using about 8, most of the tables are innoDB. Site has about 2000 users online. I also run explain on every query and every one of them is indexed. mySQL processes: http://www.pik.ba/mysqlStanje.php my.cnf: # The MySQL database server configuration file. # # You can copy this to one of: # - "/etc/mysql/my.cnf" to set global options, # - "~/.my.cnf" to set user-specific options. # # One can use all long options that the program supports. # Run program with --help to get a list of available options and with # --print-defaults to see which it would actually understand and use. # # For explanations see # http://dev.mysql.com/doc/mysql/en/server-system-variables.html # This will be passed to all mysql clients # It has been reported that passwords should be enclosed with ticks/quotes # escpecially if they contain "#" chars... # Remember to edit /etc/mysql/debian.cnf when changing the socket location. [client] port = 3306 socket = /var/run/mysqld/mysqld.sock # Here is entries for some specific programs # The following values assume you have at least 32M ram # This was formally known as [safe_mysqld]. Both versions are currently parsed. [mysqld_safe] socket = /var/run/mysqld/mysqld.sock nice = 0 [mysqld] # # * Basic Settings # user = mysql pid-file = /var/run/mysqld/mysqld.pid socket = /var/run/mysqld/mysqld.sock port = 3306 basedir = /usr datadir = /var/lib/mysql tmpdir = /tmp language = /usr/share/mysql/english skip-external-locking # # Instead of skip-networking the default is now to listen only on # localhost which is more compatible and is not less secure. bind-address = 10.100.27.30 # # * Fine Tuning # key_buffer = 64M key_buffer_size = 512M max_allowed_packet = 16M thread_stack = 128K thread_cache_size = 8 # This replaces the startup script and checks MyISAM tables if needed # the first time they are touched myisam-recover = BACKUP max_connections = 1000 table_cache = 1000 join_buffer_size = 2M tmp_table_size = 2G max_heap_table_size = 2G innodb_buffer_pool_size = 3G innodb_additional_mem_pool_size = 128M innodb_log_file_size = 100M log-slow-queries = /var/log/mysql/slow.log sort_buffer_size = 5M net_buffer_length = 5M read_buffer_size = 2M read_rnd_buffer_size = 12M thread_concurrency = 10 ft_min_word_len = 3 #thread_concurrency = 10 # # * Query Cache Configuration # query_cache_limit = 1M query_cache_size = 512M # # * Logging and Replication # # Both location gets rotated by the cronjob. # Be aware that this log type is a performance killer. #log = /var/log/mysql/mysql.log # # Error logging goes to syslog. This is a Debian improvement :) # # Here you can see queries with especially long duration #log_slow_queries = /var/log/mysql/mysql-slow.log #long_query_time = 2 #log-queries-not-using-indexes # # The following can be used as easy to replay backup logs or for replication. # note: if you are setting up a replication slave, see README.Debian about # other settings you may need to change. #server-id = 1 #log_bin = /var/log/mysql/mysql-bin.log expire_logs_days = 10 max_binlog_size = 100M #binlog_do_db = include_database_name #binlog_ignore_db = include_database_name # # * BerkeleyDB # # Using BerkeleyDB is now discouraged as its support will cease in 5.1.12. skip-bdb # # * InnoDB # # InnoDB is enabled by default with a 10MB datafile in /var/lib/mysql/. # Read the manual for more InnoDB related options. There are many! # You might want to disable InnoDB to shrink the mysqld process by circa 100MB. #skip-innodb # # * Security Features # # Read the manual, too, if you want chroot! # chroot = /var/lib/mysql/ # # For generating SSL certificates I recommend the OpenSSL GUI "tinyca". # # ssl-ca=/etc/mysql/cacert.pem # ssl-cert=/etc/mysql/server-cert.pem # ssl-key=/etc/mysql/server-key.pem [mysqldump] quick quote-names max_allowed_packet = 16M [mysql] #no-auto-rehash # faster start of mysql but no tab completition [isamchk] key_buffer = 16M # # * NDB Cluster # # See /usr/share/doc/mysql-server-*/README.Debian for more information. # # The following configuration is read by the NDB Data Nodes (ndbd processes) # not from the NDB Management Nodes (ndb_mgmd processes). # # [MYSQL_CLUSTER] # ndb-connectstring=127.0.0.1 # # * IMPORTANT: Additional settings that can override those from this file! # The files must end with '.cnf', otherwise they'll be ignored. # !includedir /etc/mysql/conf.d/

    Read the article

  • MySQL Query performance - huge difference in time

    - by Damo
    I have a query that is returning in vastly different amounts of time between 2 datasets. For one set (database A) it returns in a few seconds, for the other (database B)....well I haven't waited long enough yet, but over 10 minutes. I have dumped both of these databases to my local machine where I can reproduce the issue running MySQL 5.1.37. Curiously, database B is smaller than database A. A stripped down version of the query that reproduces the problem is: SELECT * FROM po_shipment ps JOIN po_shipment_item psi USING (ship_id) JOIN po_alloc pa ON ps.ship_id = pa.ship_id AND pa.UID_items = psi.UID_items JOIN po_header ph ON pa.hdr_id = ph.hdr_id LEFT JOIN EVENT_TABLE ev0 ON ev0.TABLE_ID1 = ps.ship_id AND ev0.EVENT_TYPE = 'MAS0' LEFT JOIN EVENT_TABLE ev1 ON ev1.TABLE_ID1 = ps.ship_id AND ev1.EVENT_TYPE = 'MAS1' LEFT JOIN EVENT_TABLE ev2 ON ev2.TABLE_ID1 = ps.ship_id AND ev2.EVENT_TYPE = 'MAS2' LEFT JOIN EVENT_TABLE ev3 ON ev3.TABLE_ID1 = ps.ship_id AND ev3.EVENT_TYPE = 'MAS3' LEFT JOIN EVENT_TABLE ev4 ON ev4.TABLE_ID1 = ps.ship_id AND ev4.EVENT_TYPE = 'MAS4' LEFT JOIN EVENT_TABLE ev5 ON ev5.TABLE_ID1 = ps.ship_id AND ev5.EVENT_TYPE = 'MAS5' WHERE ps.eta >= '2010-03-22' GROUP BY ps.ship_id LIMIT 100; The EXPLAIN query plan for the first database (A) that returns in ~2 seconds is: +----+-------------+-------+--------+----------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+---------+------------------------------+------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+--------+----------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+---------+------------------------------+------+----------------------------------------------+ | 1 | SIMPLE | ps | range | PRIMARY,IX_ETA_DATE | IX_ETA_DATE | 4 | NULL | 174 | Using where; Using temporary; Using filesort | | 1 | SIMPLE | ev0 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_PROD.ps.ship_id,const | 1 | | | 1 | SIMPLE | ev1 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_PROD.ps.ship_id,const | 1 | | | 1 | SIMPLE | ev2 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_PROD.ps.ship_id,const | 1 | | | 1 | SIMPLE | ev3 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_PROD.ps.ship_id,const | 1 | | | 1 | SIMPLE | ev4 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_PROD.ps.ship_id,const | 1 | | | 1 | SIMPLE | ev5 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_PROD.ps.ship_id,const | 1 | | | 1 | SIMPLE | psi | ref | PRIMARY,IX_po_shipment_item_po_shipment1,FK_po_shipment_item_po_shipment1 | IX_po_shipment_item_po_shipment1 | 4 | UNIVIS_PROD.ps.ship_id | 1 | | | 1 | SIMPLE | pa | ref | IX_po_alloc_po_shipment_item2,IX_po_alloc_po_details_old,FK_po_alloc_po_shipment1,FK_po_alloc_po_shipment_item1,FK_po_alloc_po_header1 | FK_po_alloc_po_shipment1 | 4 | UNIVIS_PROD.psi.ship_id | 5 | Using where | | 1 | SIMPLE | ph | eq_ref | PRIMARY,IX_HDR_ID | PRIMARY | 4 | UNIVIS_PROD.pa.hdr_id | 1 | | +----+-------------+-------+--------+----------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+---------+------------------------------+------+----------------------------------------------+ The EXPLAIN query plan for the second database (B) that returns in 600 seconds is: +----+-------------+-------+--------+----------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+---------+--------------------------------+------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+--------+----------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+---------+--------------------------------+------+----------------------------------------------+ | 1 | SIMPLE | ps | range | PRIMARY,IX_ETA_DATE | IX_ETA_DATE | 4 | NULL | 38 | Using where; Using temporary; Using filesort | | 1 | SIMPLE | psi | ref | PRIMARY,IX_po_shipment_item_po_shipment1,FK_po_shipment_item_po_shipment1 | IX_po_shipment_item_po_shipment1 | 4 | UNIVIS_DEV01.ps.ship_id | 1 | | | 1 | SIMPLE | ev0 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_DEV01.psi.ship_id,const | 1 | | | 1 | SIMPLE | ev1 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_DEV01.psi.ship_id,const | 1 | | | 1 | SIMPLE | ev2 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_DEV01.ps.ship_id,const | 1 | | | 1 | SIMPLE | ev3 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_DEV01.psi.ship_id,const | 1 | | | 1 | SIMPLE | ev4 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_DEV01.psi.ship_id,const | 1 | | | 1 | SIMPLE | ev5 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_DEV01.ps.ship_id,const | 1 | | | 1 | SIMPLE | pa | ref | IX_po_alloc_po_shipment_item2,IX_po_alloc_po_details_old,FK_po_alloc_po_shipment1,FK_po_alloc_po_shipment_item1,FK_po_alloc_po_header1 | IX_po_alloc_po_shipment_item2 | 4 | UNIVIS_DEV01.ps.ship_id | 4 | Using where | | 1 | SIMPLE | ph | eq_ref | PRIMARY,IX_HDR_ID | PRIMARY | 4 | UNIVIS_DEV01.pa.hdr_id | 1 | | +----+-------------+-------+--------+----------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+---------+--------------------------------+------+----------------------------------------------+ When database B is running I can look at the MySQL Administrator and the state remains at "Copying to tmp table" indefinitely. Database A also has this state but for only a second or so. There are no differences in the table structure, indexes, keys etc between these databases (I have done show create tables and diff'd them). The sizes of the tables are: database A: po_shipment 1776 po_shipment_item 1945 po_alloc 36298 po_header 71642 EVENT_TABLE 1608 database B: po_shipment 463 po_shipment_item 470 po_alloc 3291 po_header 56149 EVENT_TABLE 1089 Some points to note: Removing the WHERE clause makes the query return < 1 sec. Removing the GROUP BY makes the query return < 1 sec. Removing ev5, ev4, ev3 etc makes the query get faster for each one removed. Can anyone suggest how to resolve this issue? What have I missed? Many Thanks.

    Read the article

  • Optimizing Levenshtein Distance Algorithm

    - by Matt
    I have a stored procedure that uses Levenshtein Distance to determine the result closest to what the user typed. The only thing really affecting the speed is the function that calculates the Levenshtein Distance for all the records before selecting the record with the lowest distance (I've verified this by putting a 0 in place of the call to the Levenshtein function). The table has 1.5 million records, so even the slightest adjustment may shave off a few seconds. Right now the entire thing runs over 10 minutes. Here's the method I'm using: ALTER function dbo.Levenshtein ( @Source nvarchar(200), @Target nvarchar(200) ) RETURNS int AS BEGIN DECLARE @Source_len int, @Target_len int, @i int, @j int, @Source_char nchar, @Dist int, @Dist_temp int, @Distv0 varbinary(8000), @Distv1 varbinary(8000) SELECT @Source_len = LEN(@Source), @Target_len = LEN(@Target), @Distv1 = 0x0000, @j = 1, @i = 1, @Dist = 0 WHILE @j <= @Target_len BEGIN SELECT @Distv1 = @Distv1 + CAST(@j AS binary(2)), @j = @j + 1 END WHILE @i <= @Source_len BEGIN SELECT @Source_char = SUBSTRING(@Source, @i, 1), @Dist = @i, @Distv0 = CAST(@i AS binary(2)), @j = 1 WHILE @j <= @Target_len BEGIN SET @Dist = @Dist + 1 SET @Dist_temp = CAST(SUBSTRING(@Distv1, @j+@j-1, 2) AS int) + CASE WHEN @Source_char = SUBSTRING(@Target, @j, 1) THEN 0 ELSE 1 END IF @Dist > @Dist_temp BEGIN SET @Dist = @Dist_temp END SET @Dist_temp = CAST(SUBSTRING(@Distv1, @j+@j+1, 2) AS int)+1 IF @Dist > @Dist_temp SET @Dist = @Dist_temp BEGIN SELECT @Distv0 = @Distv0 + CAST(@Dist AS binary(2)), @j = @j + 1 END END SELECT @Distv1 = @Distv0, @i = @i + 1 END RETURN @Dist END Anyone have any ideas? Any input is appreciated. Thanks, Matt

    Read the article

  • Prevent full table scan for query with multiple where clauses

    - by Dave Jarvis
    A while ago I posted a message about optimizing a query in MySQL. I have since ported the data and query to PostgreSQL, but now PostgreSQL has the same problem. The solution in MySQL was to force the optimizer to not optimize using STRAIGHT_JOIN. PostgreSQL offers no such option. Here is the explain: Here is the query: SELECT avg(d.amount) AS amount, y.year FROM station s, station_district sd, year_ref y, month_ref m, daily d LEFT JOIN city c ON c.id = 10663 WHERE -- Find all the stations within a specific unit radius ... -- 6371.009 * SQRT( POW(RADIANS(c.latitude_decimal - s.latitude_decimal), 2) + (COS(RADIANS(c.latitude_decimal + s.latitude_decimal) / 2) * POW(RADIANS(c.longitude_decimal - s.longitude_decimal), 2)) ) <= 50 AND -- Ignore stations outside the given elevations -- s.elevation BETWEEN 0 AND 2000 AND sd.id = s.station_district_id AND -- Gather all known years for that station ... -- y.station_district_id = sd.id AND -- The data before 1900 is shaky; insufficient after 2009. -- y.year BETWEEN 1980 AND 2000 AND -- Filtered by all known months ... -- m.year_ref_id = y.id AND m.month = 12 AND -- Whittled down by category ... -- m.category_id = '001' AND -- Into the valid daily climate data. -- m.id = d.month_ref_id AND d.daily_flag_id <> 'M' GROUP BY y.year It appears as though PostgreSQL is looking at the DAILY table first, which is simply not the right way to go about this query as there are nearly 300 million rows. How do I force PostgreSQL to start at the CITY table? Thank you!

    Read the article

  • C# File IO with Streams - Best Memory Buffer Size

    - by AJ
    Hi, I am writing a small IO library to assist with a larger (hobby) project. A part of this library performs various functions on a file, which is read / written via the FileStream object. On each StreamReader.Read(...) pass, I fire off an event which will be used in the main app to display progress information. The processing that goes on in the loop is vaired, but is not too time consuming (it could just be a simple file copy, for example, or may involve encryption...). My main question is: What is the best memory buffer size to use? Thinking about physical disk layouts, I could pick 2k, which would cover a CD sector size and is a nice multiple of a 512 byte hard disk sector. Higher up the abstraction tree, you could go for a larger buffer which could read an entire FAT cluster at a time. I realise with today's PC's, I could go for a more memory hungry option (a couple of MiB, for example), but then I increase the time between UI updates and the user perceives a less responsive app. As an aside, I'm eventually hoping to provide a similar interface to files hosted on FTP / HTTP servers (over a local network / fastish DSL). What would be the best memory buffer size for those (again, a "best-case" tradeoff between perceived responsiveness vs. performance). Thanks in advance for any ideas, Adam

    Read the article

  • MySQL queries - how expensive are they really?

    - by incrediman
    I've heard that mysql queries are very expensive, and that you should avoid at all costs making too many of them. I'm developing a site that will be used by quite a few people, and I'm wondering: How expensive are mysql queries actually? If I have 400,000 people in my database, how expensive is it to query it for one of them? How close attention do I need to pay that I don't make too many queries per client request?

    Read the article

  • Database design in blogging systems

    - by Peter
    As a learning exercise I'm trying to put myself a blogging system. The goal is to code something that will let me create multiple blogs, like blogger.com or wordpress.com, but much simplified. I would like to ask you, what do you think is best database design for this type of script. Is it better to have one big table, containing posts from all blogs of all users (like friendfeed) or would it be better to create separate table for each blog's posts? Big thanks in advance for your help, Peter.

    Read the article

  • Speed improvements for Perl's chameneos-redux in the Computer Language Benchmarks Game

    - by Robert P
    Ever looked at the Computer Language Benchmarks Game (formerly known as the Great Language Shootout)? Perl has some pretty healthy competition there at the moment. It also occurs to me that there's probably some places that Perl's scores could be improved. The biggest one is in the chameneos-redux script right now—the Perl version runs the worst out of any language: 1,626 times slower than the C baseline solution! There are some restrictions on how the programs can be made and optimized, and there is Perl's interpreted runtime penalty, but 1,626 times? There's got to be something that can get the runtime of this program way down. Taking a look at the source code and the challenge, how can the speed be improved?

    Read the article

  • Ever any performance different between Java >> and >>> right shift operators?

    - by Sean Owen
    Is there ever reason to think the (signed) and (unsigned) right bit-shift operators in Java would perform differently? I can't detect any difference on my machine. This is purely an academic question; it's never going to be the bottleneck I'm sure. I know: it's best to write what you mean foremost; use for division by 2, for example. I assume it comes down to which architectures have which operations implemented as an instruction.

    Read the article

  • How to optimize this MySQL query

    - by James Simpson
    This query was working fine when the database was small, but now that there are millions of rows in the database, I am realizing I should have looked at optimizing this earlier. It is looking at over 600,000 rows and is Using where; Using temporary; Using filesort (which leads to an execution time of 5-10 seconds). It is using an index on the field 'battle_type.' SELECT username, SUM( outcome ) AS wins, COUNT( * ) - SUM( outcome ) AS losses FROM tblBattleHistory WHERE battle_type = '0' && outcome < '2' GROUP BY username ORDER BY wins DESC , losses ASC , username ASC LIMIT 0 , 50

    Read the article

  • Fastest image iteration in Python

    - by Greg
    I am creating a simple green screen app with Python 2.7.4 but am getting quite slow results. I am currently using PIL 1.1.7 to load and iterate the images and saw huge speed-ups changing from the old getpixel() to the newer load() and pixel access object indexing. However the following loop still takes around 2.5 seconds to run for an image of around 720p resolution: def colorclose(Cb_p, Cr_p, Cb_key, Cr_key, tola, tolb): temp = math.sqrt((Cb_key-Cb_p)**2+(Cr_key-Cr_p)**2) if temp < tola: return 0.0 else: if temp < tolb: return (temp-tola)/(tolb-tola) else: return 1.0 .... for x in range(width): for y in range(height): Y, cb, cr = fg_cbcr_list[x, y] mask = colorclose(cb, cr, cb_key, cr_key, tola, tolb) mask = 1 - mask bgr, bgg, bgb = bg_list[x,y] fgr, fgg, fgb = fg_list[x,y] pixels[x,y] = ( (int)(fgr - mask*key_color[0] + mask*bgr), (int)(fgg - mask*key_color[1] + mask*bgg), (int)(fgb - mask*key_color[2] + mask*bgb)) Am I doing anything hugely inefficient here which makes it run so slow? I have seen similar, simpler examples where the loop is replaced by a boolean matrix for instance, but for this case I can't see a way to replace the loop. The pixels[x,y] assignment seems to take the most amount of time but not knowing Python very well I am unsure of a more efficient way to do this. Any help would be appreciated.

    Read the article

  • 3 dimensional bin packing algorithms

    - by BuschnicK
    I'm faced with a 3 dimensional bin packing problem and am currently conducting some preliminary research as to which algorithms/heuristics are currently yielding the best results. Since the problem is NP hard I do not expect to find the optimal solution in every case, but I was wondering: 1) what are the best exact solvers? Branch and Bound? What problem instance sizes can I expect to solve with reasonable computing resources? 2) what are the best heuristic solvers? 3) What off-the-shelf solutions exist to conduct some experiments with?

    Read the article

  • Speed up bitstring/bit operations in Python?

    - by Xavier Ho
    I wrote a prime number generator using Sieve of Eratosthenes and Python 3.1. The code runs correctly and gracefully at 0.32 seconds on ideone.com to generate prime numbers up to 1,000,000. # from bitstring import BitString def prime_numbers(limit=1000000): '''Prime number generator. Yields the series 2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ... using Sieve of Eratosthenes. ''' yield 2 sub_limit = int(limit**0.5) flags = [False, False] + [True] * (limit - 2) # flags = BitString(limit) # Step through all the odd numbers for i in range(3, limit, 2): if flags[i] is False: # if flags[i] is True: continue yield i # Exclude further multiples of the current prime number if i <= sub_limit: for j in range(i*3, limit, i<<1): flags[j] = False # flags[j] = True The problem is, I run out of memory when I try to generate numbers up to 1,000,000,000. flags = [False, False] + [True] * (limit - 2) MemoryError As you can imagine, allocating 1 billion boolean values (1 byte 4 or 8 bytes (see comment) each in Python) is really not feasible, so I looked into bitstring. I figured, using 1 bit for each flag would be much more memory-efficient. However, the program's performance dropped drastically - 24 seconds runtime, for prime number up to 1,000,000. This is probably due to the internal implementation of bitstring. You can comment/uncomment the three lines to see what I changed to use BitString, as the code snippet above. My question is, is there a way to speed up my program, with or without bitstring?

    Read the article

  • C# Improvement on a Fire-and-Forget

    - by adam
    Greetings I have a program that creates multiples instances of a class, runs the same long-running Update method on all instances and waits for completion. I'm following Kev's approach from this question of adding the Update to ThreadPool.QueueUserWorkItem. In the main prog., I'm sleeping for a few minutes and checking a Boolean in the last child to see if done while(!child[child.Length-1].isFinished){ Thread.Sleep(...); } This solution is working the way I want, but is there a better way to do this? Both for the independent instances and checking if all work is done. Thanks

    Read the article

  • sql: Group by x,y,z; return grouped by x,y with lowest f(z)

    - by Sai Emrys
    This is for http://cssfingerprint.com I collect timing stats about how fast the different methods I use perform on different browsers, etc., so that I can optimize the scraping speed. Separately, I have a report about what each method returns for a handful of URLs with known-correct values, so that I can tell which methods are bogus on which browsers. (Each is different, alas.) The related tables look like this: CREATE TABLE `browser_tests` ( `id` int(11) NOT NULL AUTO_INCREMENT, `bogus` tinyint(1) DEFAULT NULL, `result` tinyint(1) DEFAULT NULL, `method` varchar(255) DEFAULT NULL, `url` varchar(255) DEFAULT NULL, `os` varchar(255) DEFAULT NULL, `browser` varchar(255) DEFAULT NULL, `version` varchar(255) DEFAULT NULL, `created_at` datetime DEFAULT NULL, `updated_at` datetime DEFAULT NULL, `user_agent` varchar(255) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=33784 DEFAULT CHARSET=latin1 CREATE TABLE `method_timings` ( `id` int(11) NOT NULL AUTO_INCREMENT, `method` varchar(255) DEFAULT NULL, `batch_size` int(11) DEFAULT NULL, `timing` int(11) DEFAULT NULL, `os` varchar(255) DEFAULT NULL, `browser` varchar(255) DEFAULT NULL, `version` varchar(255) DEFAULT NULL, `user_agent` varchar(255) DEFAULT NULL, `created_at` datetime DEFAULT NULL, `updated_at` datetime DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=28849 DEFAULT CHARSET=latin1 (user_agent is broken down pre-insert into browser, version, and os from a small list of recognized values using regex; I keep the original user-agent string just in case.) I have a query like this that tells me the average timing for every non-bogus browser / version / method tuple: select c, avg(bogus) as bog, timing, method, browser, version from browser_tests as b inner join ( select count(*) as c, round(avg(timing)) as timing, method, browser, version from method_timings group by browser, version, method having c > 10 order by browser, version, timing ) as t using (browser, version, method) group by browser, version, method having bog < 1 order by browser, version, timing; Which returns something like: c bog tim method browser version 88 0.8333 184 reuse_insert Chrome 4.0.249.89 18 0.0000 238 mass_insert_width Chrome 4.0.249.89 70 0.0400 246 mass_insert Chrome 4.0.249.89 70 0.0400 327 mass_noinsert Chrome 4.0.249.89 88 0.0556 367 reuse_reinsert Chrome 4.0.249.89 88 0.0556 383 jquery Chrome 4.0.249.89 88 0.0556 863 full_reinsert Chrome 4.0.249.89 187 0.0000 105 jquery Chrome 5.0.307.11 187 0.8806 109 reuse_insert Chrome 5.0.307.11 123 0.0000 110 mass_insert_width Chrome 5.0.307.11 176 0.0000 231 mass_noinsert Chrome 5.0.307.11 176 0.0000 237 mass_insert Chrome 5.0.307.11 187 0.0000 314 reuse_reinsert Chrome 5.0.307.11 187 0.0000 372 full_reinsert Chrome 5.0.307.11 12 0.7500 82 reuse_insert Chrome 5.0.335.0 12 0.2500 102 jquery Chrome 5.0.335.0 [...] I want to modify this query to return only the browser/version/method with the lowest timing - i.e. something like: 88 0.8333 184 reuse_insert Chrome 4.0.249.89 187 0.0000 105 jquery Chrome 5.0.307.11 12 0.7500 82 reuse_insert Chrome 5.0.335.0 [...] How can I do this, while still returning the method that goes with that lowest timing? I could filter it app-side, but I'd rather do this in mysql since it'd work better with my caching.

    Read the article

  • Optimizing a lot of Scanner.findWithinHorizon(pattern, 0) calls

    - by darvids0n
    I'm building a process which extracts data from 6 csv-style files and two poorly laid out .txt reports and builds output CSVs, and I'm fully aware that there's going to be some overhead searching through all that whitespace thousands of times, but I never anticipated converting about about 50,000 records would take 12 hours. Excerpt of my manual matching code (I know it's horrible that I use lists of tokens like that, but it was the best thing I could think of): public static String lookup(List<String> tokensBefore, List<String> tokensAfter) { String result = null; while(_match(tokensBefore)) { // block until all input is read if(id.hasNext()) { result = id.next(); // capture the next token that matches if(_matchImmediate(tokensAfter)) // try to match tokensAfter to this result return result; } else return null; // end of file; no match } return null; // no matches } private static boolean _match(List<String> tokens) { return _match(tokens, true); } private static boolean _match(List<String> tokens, boolean block) { if(tokens != null && !tokens.isEmpty()) { if(id.findWithinHorizon(tokens.get(0), 0) == null) return false; for(int i = 1; i <= tokens.size(); i++) { if (i == tokens.size()) { // matches all tokens return true; } else if(id.hasNext() && !id.next().matches(tokens.get(i))) { break; // break to blocking behaviour } } } else { return true; // empty list always matches } if(block) return _match(tokens); // loop until we find something or nothing else return false; // return after just one attempted match } private static boolean _matchImmediate(List<String> tokens) { if(tokens != null) { for(int i = 0; i <= tokens.size(); i++) { if (i == tokens.size()) { // matches all tokens return true; } else if(!id.hasNext() || !id.next().matches(tokens.get(i))) { return false; // doesn't match, or end of file } } return false; // we have some serious problems if this ever gets called } else { return true; // empty list always matches } } Basically wondering how I would work in an efficient string search (Boyer-Moore or similar). My Scanner id is scanning a java.util.String, figured buffering it to memory would reduce I/O since the search here is being performed thousands of times on a relatively small file. The performance increase compared to scanning a BufferedReader(FileReader(File)) was probably less than 1%, the process still looks to be taking a LONG time. I've also traced execution and the slowness of my overall conversion process is definitely between the first and last like of the lookup method. In fact, so much so that I ran a shortcut process to count the number of occurrences of various identifiers in the .csv-style files (I use 2 lookup methods, this is just one of them) and the process completed indexing approx 4 different identifiers for 50,000 records in less than a minute. Compared to 12 hours, that's instant. Some notes (updated): I don't necessarily need the pattern-matching behaviour, I only get the first field of a line of text so I need to match line breaks or use Scanner.nextLine(). All ID numbers I need start at position 0 of a line and run through til the first block of whitespace, after which is the name of the corresponding object. I would ideally want to return a String, not an int locating the line number or start position of the result, but if it's faster then it will still work just fine. If an int is being returned, however, then I would now have to seek to that line again just to get the ID; storing the ID of every line that is searched sounds like a way around that. Anything to help me out, even if it saves 1ms per search, will help, so all input is appreciated. Thankyou! Usage scenario 1: I have a list of objects in file A, who in the old-style system have an id number which is not in file A. It is, however, POSSIBLY in another csv-style file (file B) or possibly still in a .txt report (file C) which each also contain a bunch of other information which is not useful here, and so file B needs to be searched through for the object's full name (1 token since it would reside within the second column of any given line), and then the first column should be the ID number. If that doesn't work, we then have to split the search token by whitespace into separate tokens before doing a search of file C for those tokens as well. Generalised code: String field; for (/* each record in file A */) { /* construct the rest of this object from file A info */ // now to find the ID, if we can List<String> objectName = new ArrayList<String>(1); objectName.add(Pattern.quote(thisObject.fullName)); field = lookup(objectSearchToken, objectName); // search file B if(field == null) // not found in file B { lookupReset(false); // initialise scanner to check file C objectName.clear(); // not using the full name String[] tokens = thisObject.fullName.split(id.delimiter().pattern()); for(String s : tokens) objectName.add(Pattern.quote(s)); field = lookup(objectSearchToken, objectName); // search file C lookupReset(true); // back to file B } else { /* found it, file B specific processing here */ } if(field != null) // found it in B or C thisObject.ID = field; } The objectName tokens are all uppercase words with possible hyphens or apostrophes in them, separated by spaces. Much like a person's name. As per a comment, I will pre-compile the regex for my objectSearchToken, which is just [\r\n]+. What's ending up happening in file C is, every single line is being checked, even the 95% of lines which don't contain an ID number and object name at the start. Would it be quicker to use ^[\r\n]+.*(objectname) instead of two separate regexes? It may reduce the number of _match executions. The more general case of that would be, concatenate all tokensBefore with all tokensAfter, and put a .* in the middle. It would need to be matching backwards through the file though, otherwise it would match the correct line but with a huge .* block in the middle with lots of lines. The above situation could be resolved if I could get java.util.Scanner to return the token previous to the current one after a call to findWithinHorizon. I have another usage scenario. Will put it up asap.

    Read the article

  • Explanation of Pingdom Results

    - by Computer Guru
    Hi, I'm trying to optimize my page load times, and I'm using Pingdom to test the site response times. However, I'm not exactly sure what the various components of the "time bar" mean. Example link: http://tools.pingdom.com/fpt/?url=http://neosmart.net/forums//&id=2230361 According to them, the portion of the bar that is yellow is the time between "start" and "connect" and the portion of the bar that is green is the time between "connect" and "first byte" with the blue section being the actual transfer time (time between "first byte" and "last byte"). If I'm trying to the first two (which take very long in my case), what's the recommended course of action? Thanks.

    Read the article

  • iPhone - dequeueReusableCellWithIdentifier usage

    - by Jukurrpa
    Hi, I'm working on a iPhone app which has a pretty large UITableView with data taken from the web, so I'm trying to optimize its creation and usage. I found out that dequeueReusableCellWithIdentifier is pretty useful, but after seeing many source codes using this, I'm wondering if the usage I make of this function is the good one. Here is what people usually do: UITableViewCell* cell = [tableView dequeueReusableCellWithIdentifier:@"Cell"]; if (cell == nil) { cell = [[UITableViewCell alloc] initWithFrame:CGRectZero reuseIdentifier:@"Cell"]; // Add elements to the cell return cell; And here is the way I did it: NSString identifier = [NSString stringWithFormat:@"Cell @d", indexPath.row]: // The cell row UITableViewCell* cell = [tableView dequeueReusableCellWithIdentifier:identifier]; if (cell != nil) return cell; cell = [[UITableViewCell alloc] initWithFrame:CGRectZero reuseIdentifier:identifier]; // Add elements to the cell return cell; The difference is that people use the same identifier for every cell, so dequeuing one only avoids to alloc a new one. For me, the point of queuing was to give each cell a unique identifier, so when the app asks for a cell it already displayed, neither allocation nor element adding have to be done. In fine I don't know which is best, the "common" method ceils the table's memory usage to the exact number of cells it display, whislt the method I use seems to favor speed as it keeps all calculated cells, but can cause large memory consumption (unless there's an inner limit to the queue). Am I wrong to use it this way? Or is it just up to the developper, depending on his needs?

    Read the article

  • Jetty 6 - QueuedThreadPool versus ThreadPool

    - by Walter White
    Hi all, I am using Jetty 6 and was wondering when the QueuedThreadPool should be used over the ThreadPool? By default, Jetty 6 comes configured with the QueuedThreadPool. My server has Java 6 installed so I was thinking that I should use the ThreadPool: <New class="org.mortbay.thread.QueuedThreadPool"> <Set name="minThreads">10</Set> <Set name="maxThreads">200</Set> <Set name="lowThreads">20</Set> <Set name="SpawnOrShrinkAt">2</Set> </New> <New class="org.mortbay.thread.concurrent.ThreadPool"> <Set name="corePoolSize">50</Set> <Set name="maximumPoolSize">50</Set> </New> Thanks, Walter

    Read the article

< Previous Page | 45 46 47 48 49 50 51 52 53 54 55 56  | Next Page >