Search Results

Search found 27144 results on 1086 pages for 'tail call optimization'.

Page 107/1086 | < Previous Page | 103 104 105 106 107 108 109 110 111 112 113 114  | Next Page >

  • Simple MySQL Query taking 45 seconds (Gets a record and its "latest" child record)

    - by Brian Lacy
    I have a query which gets a customer and the latest transaction for that customer. Currently this query takes over 45 seconds for 1000 records. This is especially problematic because the script itself may need to be executed as frequently as once per minute! I believe using subqueries may be the answer, but I've had trouble constructing it to actually give me the results I need. SELECT customer.CustID, customer.leadid, customer.Email, customer.FirstName, customer.LastName, transaction.*, MAX(transaction.TransDate) AS LastTransDate FROM customer INNER JOIN transaction ON transaction.CustID = customer.CustID WHERE customer.Email = '".$email."' GROUP BY customer.CustID ORDER BY LastTransDate LIMIT 1000 I really need to get this figured out ASAP. Any help would be greatly appreciated!

    Read the article

  • SQL-query task, decision?

    - by Sirius Lampochkin
    There is a table of currencies rates in MS SQL Server 2005: ID | CURR | RATE | DATE 1   | USD   | 30      | 01.10.2010 3   | GBP   | 45      | 07.10.2010 5   | USD   | 31      | 08.10.2010 7   | GBP   | 46      | 09.10.2010 9   | USD   | 32      | 12.10.2010 11 | GBP   | 48      | 03.10.2010 Rate are updated in real time and there are more than 1 billion rows in the table. It needs to write a SQL-query, wich will provide latest rates per each currency. My decision is: SELECT c.[id],c.[curr],c.[rate],c.[date] FROM [curr_rate] c, (SELECT curr, MAX(date) AS rate_date FROM [curr_rate] GROUP BY curr) t WHERE c.date = t.rate_date AND c.curr = t.curr ORDER BY c.[curr] ASC Is it possible to write a query without sub-queries and join's with derived tables?

    Read the article

  • Cannot sort a row of size 8130, which is greater than the allowable maximum of 8094

    - by Sri Kumar
    Hello All, SELECT DISTINCT tblJobReq.JobReqId, tblJobReq.JobStatusId, tblJobClass.JobClassId, tblJobClass.Title, tblJobReq.JobClassSubTitle, tblJobAnnouncement.JobClassDesc, tblJobAnnouncement.EndDate, tblJobAnnouncement.AgencyMktgVerbage, tblJobAnnouncement.SpecInfo, tblJobAnnouncement.Benefits, tblSalary.MinRateSal, tblSalary.MaxRateSal, tblSalary.MinRateHour, tblSalary.MaxRateHour, tblJobClass.StatementEval, tblJobReq.ApprovalDate, tblJobReq.RecruiterId, tblJobReq.AgencyId FROM ((tblJobReq LEFT JOIN tblJobAnnouncement ON tblJobReq.JobReqId =tblJobAnnouncement.JobReqId) INNER JOIN tblJobClass ON tblJobReq.JobClassId = tblJobClass.JobClassId) LEFT JOIN tblSalary ON tblJobClass.SalaryCode = tblSalary.SalaryCode WHERE (tblJobReq.JobClassId in (SELECT JobClassId from tblJobClass WHERE tblJobClass.Title like '%Family Therapist%')) When i try to execute the query it results in the following error. Cannot sort a row of size 8130, which is greater than the allowable maximum of 8094 I checked and didn't find any solution. The only way is to truncate (substring())the "tblJobAnnouncement.JobClassDesc" in the query which has column size of around 8000. Do we have any work around so that i need not truncate the values. Or Can this query be optimised? Any setting in SQL Server 2000?

    Read the article

  • Representing game states in Tic Tac Toe

    - by dacman
    The goal of the assignment that I'm currently working on for my Data Structures class is to create a of Quantum Tic Tac Toe with an AI that plays to win. Currently, I'm having a bit of trouble finding the most efficient way to represent states. Overview of current Structure: AbstractGame Has and manages AbstractPlayers (game.nextPlayer() returns next player by int ID) Has and intializes AbstractBoard at the beginning of the game Has a GameTree (Complete if called in initialization, incomplete otherwise) AbstractBoard Has a State, a Dimension, and a Parent Game Is a mediator between Player and State, (Translates States from collections of rows to a Point representation Is a StateConsumer AbstractPlayer Is a State Producer Has a ConcreteEvaluationStrategy to evaluate the current board StateTransveralPool Precomputes possible transversals of "3-states". Stores them in a HashMap, where the Set contains nextStates for a given "3-state" State Contains 3 Sets -- a Set of X-Moves, O-Moves, and the Board Each Integer in the set is a Row. These Integer values can be used to get the next row-state from the StateTransversalPool SO, the principle is Each row can be represented by the binary numbers 000-111, where 0 implies an open space and 1 implies a closed space. So, for an incomplete TTT board: From the Set<Integer> board perspective: X_X R1 might be: 101 OO_ R2 might be: 110 X_X R3 might be: 101, where 1 is an open space, and 0 is a closed space From the Set<Integer> xMoves perspective: X_X R1 might be: 101 OO_ R2 might be: 000 X_X R3 might be: 101, where 1 is an X and 0 is not From the Set<Integer> oMoves perspective: X_X R1 might be: 000 OO_ R2 might be: 110 X_X R3 might be: 000, where 1 is an O and 0 is not Then we see that x{R1,R2,R3} & o{R1,R2,R3} = board{R1,R2,R3} The problem is quickly generating next states for the GameTree. If I have player Max (x) with board{R1,R2,R3}, then getting the next row-states for R1, R2, and R3 is simple.. Set<Integer> R1nextStates = StateTransversalPool.get(R1); The problem is that I have to combine each one of those states with R1 and R2. Is there a better data structure besides Set that I could use? Is there a more efficient approach in general? I've also found Point<-State mediation cumbersome. Is there another approach that I could try there? Thanks! Here is the code for my ConcretePlayer class. It might help explain how players produce new states via moves, using the StateProducer (which might need to become StateFactory or StateBuilder). public class ConcretePlayerGeneric extends AbstractPlayer { @Override public BinaryState makeMove() { // Given a move and the current state, produce a new state Point playerMove = super.strategy.evaluate(this); BinaryState currentState = super.getInGame().getBoard().getState(); return StateProducer.getState(this, playerMove, currentState); } } EDIT: I'm starting with normal TTT and moving to Quantum TTT. Given the framework, it should be as simple as creating several new Concrete classes and tweaking some things.

    Read the article

  • Datatable add new column and values speed issue

    - by Cine
    I am having some speed issue with my datatables. In this particular case I am using it as holder of data, it is never used in GUI or any other scenario that actually uses any of the fancy features. In my speed trace, this particular constructor was showing up as a heavy user of time when my database is ~40k rows. The main user was set_Item of DataTable. protected myclass(DataTable dataTable, DataColumn idColumn) { this.dataTable = dataTable; IdColumn = idColumn ?? this.dataTable.Columns.Add(string.Format("SYS_{0}_SYS", Guid.NewGuid()), Type.GetType("System.Int32")); JobIdColumn = this.dataTable.Columns.Add(string.Format("SYS_{0}_SYS", Guid.NewGuid()), Type.GetType("System.Int32")); IsNewColumn = this.dataTable.Columns.Add(string.Format("SYS_{0}_SYS", Guid.NewGuid()), Type.GetType("System.Int32")); int id = 1; foreach (DataRow r in this.dataTable.Rows) { r[JobIdColumn] = id++; r[IsNewColumn] = (r[IdColumn] == null || r[IdColumn].ToString() == string.Empty) ? 1 : 0; } Digging deeper into the trace, it turns out that set_Item calls EndEdit, which brings my thoughts to the transaction support of the DataTable, for which I have no usage for in my scenario. So my solution to this was to open editing on all of the rows and never close them again. _dt.BeginLoadData(); foreach (DataRow row in _dt.Rows) row.BeginEdit(); Is there a better solution? This feels too much like a big giant hack that will eventually come and bite me. You might suggest that I dont use DataTable at all, but I have already considered that and rejected it due to the amount of effort that would be required to reimplement with a custom class. The main reason it is a datatable is that it is ancient code (.net 1.1 time) and I dont want to spend that much time changing it, and it is also because the original table comes out of a third party component.

    Read the article

  • MySQL MyISAM table performance... painfully, painfully slow

    - by Salman A
    I've got a table structure that can be summarized as follows: pagegroup * pagegroupid * name has 3600 rows page * pageid * pagegroupid * data references pagegroup; has 10000 rows; can have anything between 1-700 rows per pagegroup; the data column is of type mediumtext and the column contains 100k - 200kbytes data per row userdata * userdataid * pageid * column1 * column2 * column9 references page; has about 300,000 rows; can have about 1-50 rows per page The above structure is pretty straight forwad, the problem is that that a join from userdata to page group is terribly, terribly slow even though I have indexed all columns that should be indexed. The time needed to run a query for such a join (userdata inner_join page inner_join pagegroup) exceeds 3 minutes. This is terribly slow considering the fact that I am not selecting the data column at all. Example of the query that takes too long: SELECT userdata.column1, pagegroup.name FROM userdata INNER JOIN page USING( pageid ) INNER JOIN pagegroup USING( pagegroupid ) Please help by explaining why does it take so long and what can i do to make it faster. Edit #1 Explain returns following gibberish: id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE userdata ALL pageid 372420 1 SIMPLE page eq_ref PRIMARY,pagegroupid PRIMARY 4 topsecret.userdata.pageid 1 1 SIMPLE pagegroup eq_ref PRIMARY PRIMARY 4 topsecret.page.pagegroupid 1 Edit #2 SELECT u.field2, p.pageid FROM userdata u INNER JOIN page p ON u.pageid = p.pageid; /* 0.07 sec execution, 6.05 sec fecth */ id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE u ALL pageid 372420 1 SIMPLE p eq_ref PRIMARY PRIMARY 4 topsecret.u.pageid 1 Using index SELECT p.pageid, g.pagegroupid FROM page p INNER JOIN pagegroup g ON p.pagegroupid = g.pagegroupid; /* 9.37 sec execution, 60.0 sec fetch */ id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE g index PRIMARY PRIMARY 4 3646 Using index 1 SIMPLE p ref pagegroupid pagegroupid 5 topsecret.g.pagegroupid 3 Using where Moral of the story Keep medium/long text columns in a separate table if you run into performance problems such as this one.

    Read the article

  • Multiple ParticleSystems in cocos2d

    - by Mattias Akerman
    I wonder about what road I should go with ParticleSystem. In this particular case I want to create 1-20 small explosions at the same time but with different positions. Right now I'm creating a new ParticleSystem for each explosion and then release it, but of course this is very punishing to the performance. My question is: Is there a way to create one ParticleSystem with multiple emitting sources. If not should I create an array of ParticleSystem in init and then use a free one when an explosion is needed? Or is there another approach I haven't thought of?

    Read the article

  • Permutations of Varying Size

    - by waiwai933
    I'm trying to write a function in PHP that gets all permutations of all possible sizes. I think an example would be the best way to start off: $my_array = array(1,1,2,3); Possible permutations of varying size: 1 1 // * See Note 2 3 1,1 1,2 1,3 // And so forth, for all the sets of size 2 1,1,2 1,1,3 1,2,1 // And so forth, for all the sets of size 3 1,1,2,3 1,1,3,2 // And so forth, for all the sets of size 4 Note: I don't care if there's a duplicate or not. For the purposes of this example, all future duplicates have been omitted. What I have so far in PHP: function getPermutations($my_array){ $permutation_length = 1; $keep_going = true; while($keep_going){ while($there_are_still_permutations_with_this_length){ // Generate the next permutation and return it into an array // Of course, the actual important part of the code is what I'm having trouble with. } $permutation_length++; if($permutation_length>count($my_array)){ $keep_going = false; } else{ $keep_going = true; } } return $return_array; } The closest thing I can think of is shuffling the array, picking the first n elements, seeing if it's already in the results array, and if it's not, add it in, and then stop when there are mathematically no more possible permutations for that length. But it's ugly and resource-inefficient. Any pseudocode algorithms would be greatly appreciated. Also, for super-duper (worthless) bonus points, is there a way to get just 1 permutation with the function but make it so that it doesn't have to recalculate all previous permutations to get the next? For example, I pass it a parameter 3, which means it's already done 3 permutations, and it just generates number 4 without redoing the previous 3? (Passing it the parameter is not necessary, it could keep track in a global or static). The reason I ask this is because as the array grows, so does the number of possible combinations. Suffice it to say that one small data set with only a dozen elements grows quickly into the trillions of possible combinations and I don't want to task PHP with holding trillions of permutations in its memory at once.

    Read the article

  • what webserver / mod / technique should I use to serve everything from memory?

    - by reinier
    I've lots of lookuptables from which I'll generate my webresponse. I think IIS with Asp.net enables me to keep static lookuptables in memory which I can use to serve up my responses very fast. Are there however also non .net solutions which can do the same? I've looked at fastcgi, but I think this starts X processes, of which anyone can handle Y requests. But the processes are by definition shielded from eachother. I could configure fastcgi to use just 1 process, but does this have scalability implications? anything using PHP or any other interpreted language won't fly because it is also cgi or fastcgi bound right? I understand memcache could be an option, though this would require another (local) socket connection which I'd rather avoid since everything in memory would be much faster. The solution can work under WIndows or Unix... it doesn't matter too much. The only thing which matters is that there will be a lot of requests (100/sec now and growing to 500/sec in a year), and I want to reduce the amount of webservers needed to process it. The current solution is done using PHP and memcache (and the occasional hit to the SQL server backend). Although it is fast (for php anyway), Apache has real problems when the 50/sec is passed. I've put a bounty on this question since I've not seen enough responses to make a wise choice. At the moment I'm considering either Asp.net or fastcgi with C(++).

    Read the article

  • Is it possible to implement bitwise operators using integer arithmetic?

    - by Statement
    Hello World! I am facing a rather peculiar problem. I am working on a compiler for an architecture that doesn't support bitwise operations. However, it handles signed 16 bit integer arithmetics and I was wondering if it would be possible to implement bitwise operations using only: Addition (c = a + b) Subtraction (c = a - b) Division (c = a / b) Multiplication (c = a * b) Modulus (c = a % b) Minimum (c = min(a, b)) Maximum (c = max(a, b)) Comparisons (c = (a < b), c = (a == b), c = (a <= b), et.c.) Jumps (goto, for, et.c.) The bitwise operations I want to be able to support are: Or (c = a | b) And (c = a & b) Xor (c = a ^ b) Left Shift (c = a << b) Right Shift (c = a b) (All integers are signed so this is a problem) Signed Shift (c = a b) One's Complement (a = ~b) (Already found a solution, see below) Normally the problem is the other way around; how to achieve arithmetic optimizations using bitwise hacks. However not in this case. Writable memory is very scarce on this architecture, hence the need for bitwise operations. The bitwise functions themselves should not use a lot of temporary variables. However, constant read-only data & instruction memory is abundant. A side note here also is that jumps and branches are not expensive and all data is readily cached. Jumps cost half the cycles as arithmetic (including load/store) instructions do. On other words, all of the above supported functions cost twice the cycles of a single jump. Some thoughts that might help: I figured out that you can do one's complement (negate bits) with the following code: // Bitwise one's complement b = ~a; // Arithmetic one's complement b = -1 - a; I also remember the old shift hack when dividing with a power of two so the bitwise shift can be expressed as: // Bitwise left shift b = a << 4; // Arithmetic left shift b = a * 16; // 2^4 = 16 // Signed right shift b = a >>> 4; // Arithmetic right shift b = a / 16; For the rest of the bitwise operations I am slightly clueless. I wish the architects of this architecture would have supplied bit-operations. I would also like to know if there is a fast/easy way of computing the power of two (for shift operations) without using a memory data table. A naive solution would be to jump into a field of multiplications: b = 1; switch (a) { case 15: b = b * 2; case 14: b = b * 2; // ... exploting fallthrough (instruction memory is magnitudes larger) case 2: b = b * 2; case 1: b = b * 2; } Or a Set & Jump approach: switch (a) { case 15: b = 32768; break; case 14: b = 16384; break; // ... exploiting the fact that a jump is faster than one additional mul // at the cost of doubling the instruction memory footprint. case 2: b = 4; break; case 1: b = 2; break; }

    Read the article

  • Does query plan optimizer works well with joined/filtered table-valued functions?

    - by smoothdeveloper
    In SQLSERVER 2005, I'm using table-valued function as a convenient way to perform arbitrary aggregation on subset data from large table (passing date range or such parameters). I'm using theses inside larger queries as joined computations and I'm wondering if the query plan optimizer work well with them in every condition or if I'm better to unnest such computation in my larger queries. Does query plan optimizer unnest table-valued functions if it make sense? If it doesn't, what do you recommend to avoid code duplication that would occur by manually unnesting them? If it does, how do you identify that from the execution plan? code sample: create table dbo.customers ( [key] uniqueidentifier , constraint pk_dbo_customers primary key ([key]) ) go /* assume large amount of data */ create table dbo.point_of_sales ( [key] uniqueidentifier , customer_key uniqueidentifier , constraint pk_dbo_point_of_sales primary key ([key]) ) go create table dbo.product_ranges ( [key] uniqueidentifier , constraint pk_dbo_product_ranges primary key ([key]) ) go create table dbo.products ( [key] uniqueidentifier , product_range_key uniqueidentifier , release_date datetime , constraint pk_dbo_products primary key ([key]) , constraint fk_dbo_products_product_range_key foreign key (product_range_key) references dbo.product_ranges ([key]) ) go . /* assume large amount of data */ create table dbo.sales_history ( [key] uniqueidentifier , product_key uniqueidentifier , point_of_sale_key uniqueidentifier , accounting_date datetime , amount money , quantity int , constraint pk_dbo_sales_history primary key ([key]) , constraint fk_dbo_sales_history_product_key foreign key (product_key) references dbo.products ([key]) , constraint fk_dbo_sales_history_point_of_sale_key foreign key (point_of_sale_key) references dbo.point_of_sales ([key]) ) go create function dbo.f_sales_history_..snip.._date_range ( @accountingdatelowerbound datetime, @accountingdateupperbound datetime ) returns table as return ( select pos.customer_key , sh.product_key , sum(sh.amount) amount , sum(sh.quantity) quantity from dbo.point_of_sales pos inner join dbo.sales_history sh on sh.point_of_sale_key = pos.[key] where sh.accounting_date between @accountingdatelowerbound and @accountingdateupperbound group by pos.customer_key , sh.product_key ) go -- TODO: insert some data -- this is a table containing a selection of product ranges declare @selectedproductranges table([key] uniqueidentifier) -- this is a table containing a selection of customers declare @selectedcustomers table([key] uniqueidentifier) declare @low datetime , @up datetime -- TODO: set top query parameters . select saleshistory.customer_key , saleshistory.product_key , saleshistory.amount , saleshistory.quantity from dbo.products p inner join @selectedproductranges productrangeselection on p.product_range_key = productrangeselection.[key] inner join @selectedcustomers customerselection on 1 = 1 inner join dbo.f_sales_history_..snip.._date_range(@low, @up) saleshistory on saleshistory.product_key = p.[key] and saleshistory.customer_key = customerselection.[key] I hope the sample makes sense. Much thanks for your help!

    Read the article

  • How to improve my LDAP schema?

    - by asmaier
    Hello, I have a OpenLDAP Database and it holds some project objects that look like dn: cn=Proj1,ou=Project,ou=ua,dc=org cn: Proj1 objectClass: top objectClass: posixGroup member: 001ag member: 002ag System: ABEL System: PCx Budget: ABEL:1000000:0.3 Budget: PCx:300000:0.3 One can see that the Budget attribute is a ":"-separated string, where the first part holds the name of the system the budget is for, the second part holds some budget (which may change every month) and the last entry is a conversion factor for the budget of that system. Seeing this, I thought this is bad database design, since attribute values should always be atomic. But how can I improve that in LDAP, so that I can do a direct ldapsearch or a direct ldapmodify of the budget of System "ABEL" instead of writing a script, that will have to parse and split the ":"-separated string?

    Read the article

  • Why does adding Crossover to my Genetic Algorithm gives me worse results?

    - by MahlerFive
    I have implemented a Genetic Algorithm to solve the Traveling Salesman Problem (TSP). When I use only mutation, I find better solutions than when I add in crossover. I know that normal crossover methods do not work for TSP, so I implemented both the Ordered Crossover and the PMX Crossover methods, and both suffer from bad results. Here are the other parameters I'm using: Mutation: Single Swap Mutation or Inverted Subsequence Mutation (as described by Tiendil here) with mutation rates tested between 1% and 25%. Selection: Roulette Wheel Selection Fitness function: 1 / distance of tour Population size: Tested 100, 200, 500, I also run the GA 5 times so that I have a variety of starting populations. Stop Condition: 2500 generations With the same dataset of 26 points, I usually get results of about 500-600 distance using purely mutation with high mutation rates. When adding crossover my results are usually in the 800 distance range. The other confusing thing is that I have also implemented a very simple Hill-Climbing algorithm to solve the problem and when I run that 1000 times (faster than running the GA 5 times) I get results around 410-450 distance, and I would expect to get better results using a GA. Any ideas as to why my GA performing worse when I add crossover? And why is it performing much worse than a simple Hill-Climb algorithm which should get stuck on local maxima as it has no way of exploring once it finds a local max?

    Read the article

  • Optimize GROUP BY&ORDER BY query

    - by Jan Hancic
    I have a web page where users upload&watch videos. Last week I asked what is the best way to track video views so that I could display the most viewed videos this week (videos from all dates). Now I need some help optimizing a query with which I get the videos from the database. The relevant tables are this: video (~239371 rows) VID(int), UID(int), title(varchar), status(enum), type(varchar), is_duplicate(enum), is_adult(enum), channel_id(tinyint) signup (~115440 rows) UID(int), username(varchar) videos_views (~359202 rows after 6 days of collecting data, so this table will grow rapidly) videos_id(int), views_date(date), num_of_views(int) The table video holds the videos, signup hodls users and videos_views holds data about video views (each video can have one row per day in that table). I have this query that does the trick, but takes ~10s to execute, and I imagine this will only get worse over time as the videos_views table grows in size. SELECT v.VID, v.title, v.vkey, v.duration, v.addtime, v.UID, v.viewnumber, v.com_num, v.rate, v.THB, s.username, SUM(vvt.num_of_views) AS tmp_num FROM video v LEFT JOIN videos_views vvt ON v.VID = vvt.videos_id LEFT JOIN signup s on v.UID = s.UID WHERE v.status = 'Converted' AND v.type = 'public' AND v.is_duplicate = '0' AND v.is_adult = '0' AND v.channel_id <> 10 AND vvt.views_date >= '2001-05-11' GROUP BY vvt.videos_id ORDER BY tmp_num DESC LIMIT 8 And here is a screenshot of the EXPLAIN result: So, how can I optimize this?

    Read the article

  • Date arithmetic using integer values

    - by Dave Jarvis
    Problem String concatenation is slowing down a query: date(extract(YEAR FROM m.taken)||'-1-1') d1, date(extract(YEAR FROM m.taken)||'-1-31') d2 This is realized in code as part of a string, which follows (where the p_ variables are integers): date(extract(YEAR FROM m.taken)||''-'||p_month1||'-'||p_day1||''') d1, date(extract(YEAR FROM m.taken)||''-'||p_month2||'-'||p_day2||''') d2 This part of the query runs in 3.2 seconds with the dates, and 1.5 seconds without, leading me to believe there is ample room for improvement. Question What is a better way to create the date (presumably without concatenation)? Many thanks!

    Read the article

  • optimal memory layout for read-only/write memory segments.

    - by aaa
    hello. Suppose I have two memory segments (equal size each, approximately 1kb in size) , one is read-only (after initialization), and other is read/write. what is the best layout in memory for such segments in terms of memory performance? one allocation, contiguous segments or two allocations (in general not contiguous). my primary architecture is linux Intel 64-bit. my feeling is former (cache friendlier) case is better. is there circumstances, where second layout is preferred? Thanks

    Read the article

  • Any ideas on How to search a 2D array quickly?

    - by Tattat
    I jave a 2D array like this, just like a matrix: {{1, 2, 4, 5, 3, 6}, {8, 3, 4, 4, 5, 2}, {8, 3, 4, 2, 6, 2}, //code skips... ... } I want to get all the "4" position, instead of searching the array one by way, and return the position, how can I search it faster / more efficient? thz in advance.

    Read the article

  • Python performance improvement request for winkler

    - by Martlark
    I'm a python n00b and I'd like some suggestions on how to improve the algorithm to improve the performance of this method to compute the Jaro-Winkler distance of two names. def winklerCompareP(str1, str2): """Return approximate string comparator measure (between 0.0 and 1.0) USAGE: score = winkler(str1, str2) ARGUMENTS: str1 The first string str2 The second string DESCRIPTION: As described in 'An Application of the Fellegi-Sunter Model of Record Linkage to the 1990 U.S. Decennial Census' by William E. Winkler and Yves Thibaudeau. Based on the 'jaro' string comparator, but modifies it according to whether the first few characters are the same or not. """ # Quick check if the strings are the same - - - - - - - - - - - - - - - - - - # jaro_winkler_marker_char = chr(1) if (str1 == str2): return 1.0 len1 = len(str1) len2 = len(str2) halflen = max(len1,len2) / 2 - 1 ass1 = '' # Characters assigned in str1 ass2 = '' # Characters assigned in str2 #ass1 = '' #ass2 = '' workstr1 = str1 workstr2 = str2 common1 = 0 # Number of common characters common2 = 0 #print "'len1', str1[i], start, end, index, ass1, workstr2, common1" # Analyse the first string - - - - - - - - - - - - - - - - - - - - - - - - - # for i in range(len1): start = max(0,i-halflen) end = min(i+halflen+1,len2) index = workstr2.find(str1[i],start,end) #print 'len1', str1[i], start, end, index, ass1, workstr2, common1 if (index > -1): # Found common character common1 += 1 #ass1 += str1[i] ass1 = ass1 + str1[i] workstr2 = workstr2[:index]+jaro_winkler_marker_char+workstr2[index+1:] #print "str1 analyse result", ass1, common1 #print "str1 analyse result", ass1, common1 # Analyse the second string - - - - - - - - - - - - - - - - - - - - - - - - - # for i in range(len2): start = max(0,i-halflen) end = min(i+halflen+1,len1) index = workstr1.find(str2[i],start,end) #print 'len2', str2[i], start, end, index, ass1, workstr1, common2 if (index > -1): # Found common character common2 += 1 #ass2 += str2[i] ass2 = ass2 + str2[i] workstr1 = workstr1[:index]+jaro_winkler_marker_char+workstr1[index+1:] if (common1 != common2): print('Winkler: Wrong common values for strings "%s" and "%s"' % \ (str1, str2) + ', common1: %i, common2: %i' % (common1, common2) + \ ', common should be the same.') common1 = float(common1+common2) / 2.0 ##### This is just a fix ##### if (common1 == 0): return 0.0 # Compute number of transpositions - - - - - - - - - - - - - - - - - - - - - # transposition = 0 for i in range(len(ass1)): if (ass1[i] != ass2[i]): transposition += 1 transposition = transposition / 2.0 # Now compute how many characters are common at beginning - - - - - - - - - - # minlen = min(len1,len2) for same in range(minlen+1): if (str1[:same] != str2[:same]): break same -= 1 if (same > 4): same = 4 common1 = float(common1) w = 1./3.*(common1 / float(len1) + common1 / float(len2) + (common1-transposition) / common1) wn = w + same*0.1 * (1.0 - w) return wn

    Read the article

  • Is a red-black tree my ideal data structure?

    - by Hugo van der Sanden
    I have a collection of items (big rationals) that I'll be processing. In each case, processing will consist of removing the smallest item in the collection, doing some work, and then adding 0-2 new items (which will always be larger than the removed item). The collection will be initialised with one item, and work will continue until it is empty. I'm not sure what size the collection is likely to reach, but I'd expect in the range 1M-100M items. I will not need to locate any item other than the smallest. I'm currently planning to use a red-black tree, possibly tweaked to keep a pointer to the smallest item. However I've never used one before, and I'm unsure whether my pattern of use fits its characteristics well. 1) Is there a danger the pattern of deletion from the left + random insertion will affect performance, eg by requiring a significantly higher number of rotations than random deletion would? Or will delete and insert operations still be O(log n) with this pattern of use? 2) Would some other data structure give me better performance, either because of the deletion pattern or taking advantage of the fact I only ever need to find the smallest item? Update: glad I asked, the binary heap is clearly a better solution for this case, and as promised turned out to be very easy to implement. Hugo

    Read the article

  • How to optimize my PostgreSQL DB for prefix search?

    - by asmaier
    I have a table called "nodes" with roughly 1.7 million rows in my PostgreSQL db =#\d nodes Table "public.nodes" Column | Type | Modifiers --------+------------------------+----------- id | integer | not null title | character varying(256) | score | double precision | Indexes: "nodes_pkey" PRIMARY KEY, btree (id) I want to use information from that table for autocompletion of a search field, showing the user a list of the ten titles having the highest score fitting to his input. So I used this query (here searching for all titles starting with "s") =# explain analyze select title,score from nodes where title ilike 's%' order by score desc; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------- Sort (cost=64177.92..64581.38 rows=161385 width=25) (actual time=4930.334..5047.321 rows=161264 loops=1) Sort Key: score Sort Method: external merge Disk: 5712kB -> Seq Scan on nodes (cost=0.00..46630.50 rows=161385 width=25) (actual time=0.611..4464.413 rows=161264 loops=1) Filter: ((title)::text ~~* 's%'::text) Total runtime: 5260.791 ms (6 rows) This was much to slow for using it with autocomplete. With some information from Using PostgreSQL in Web 2.0 Applications I was able to improve that with a special index =# create index title_idx on nodes using btree(lower(title) text_pattern_ops); =# explain analyze select title,score from nodes where lower(title) like lower('s%') order by score desc limit 10; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ Limit (cost=18122.41..18122.43 rows=10 width=25) (actual time=1324.703..1324.708 rows=10 loops=1) -> Sort (cost=18122.41..18144.60 rows=8876 width=25) (actual time=1324.700..1324.702 rows=10 loops=1) Sort Key: score Sort Method: top-N heapsort Memory: 17kB -> Bitmap Heap Scan on nodes (cost=243.53..17930.60 rows=8876 width=25) (actual time=96.124..1227.203 rows=161264 loops=1) Filter: (lower((title)::text) ~~ 's%'::text) -> Bitmap Index Scan on title_idx (cost=0.00..241.31 rows=8876 width=0) (actual time=90.059..90.059 rows=161264 loops=1) Index Cond: ((lower((title)::text) ~>=~ 's'::text) AND (lower((title)::text) ~<~ 't'::text)) Total runtime: 1325.085 ms (9 rows) So this gave me a speedup of factor 4. But can this be further improved? What if I want to use '%s%' instead of 's%'? Do I have any chance of getting a decent performance with PostgreSQL in that case, too? Or should I better try a different solution (Lucene?, Sphinx?) for implementing my autocomplete feature?

    Read the article

  • High performance text file parsing in .net

    - by diamandiev
    Here is the situation: I am making a small prog to parse server log files. I tested it with a log file with several thousand requests (between 10000 - 20000 don't know exactly) What i have to do is to load the log text files into memory so that i can query them. This is taking the most resources. The methods that take the most cpu time are those (worst culprits first): string.split - splits the line values into a array of values string.contains - checking if the user agent contains a specific agent string. (determine browser ID) string.tolower - various purposes streamreader.readline - to read the log file line by line. string.startswith - determine if line is a column definition line or a line with values there were some others that i was able to replace. For example the dictionary getter was taking lots of resources too. Which i had not expected since its a dictionary and should have its keys indexed. I replaced it with a multidimensional array and saved some cpu time. Now i am running on a fast dual core and the total time it takes to load the file i mentioned is about 1 sec. Now this is really bad. Imagine a site that has tens of thousands of visits a day. It's going to take minutes to load the log file. So what are my alternatives? If any, cause i think this is just a .net limitation and i can't do much about it.

    Read the article

  • Custom View - Avoid redrawing when non-interactive

    - by MasterGaurav
    I have a complex custom view - photo collage. What is observed is whenever any UI interaction happens, the view is redrawn. How can I avoid complete redrawing (for example, use a cached UI) of the view specially when I click the "back" button to go back to previous activity because that also causes redrawing of the view. While exploring the API and web, I found a method - getDrawingCache() - but don't know how to use it effectively. How do I use it effectively? I've had other issues with Custom Views that I outline here.

    Read the article

  • Lazy loading the addthis script? (or lazy loading external js content dependent on already fired eve

    - by Keith Bentrup
    I want to have the addthis widget available for my users, but I want to lazy load it so that my page loads as quickly as possible. However, after trying it via a script tag and then via my lazy loading method, it appears to only work via the script tag. In the obfuscated code, I see something that looks like it's dependent on the DOMContentLoaded event (at least for firefox). Since the DOMContentLoaded event has already fired, the widget doesn't render properly. What to do? I could just use a script tag (slower)... or could I fire (in a cross browser way) the DOMContentLoaded (or equivalent) event? I have a feeling this may not be possible b/c I believe that (like jQuery) there are multiple tests of the content ready event, and so multiple simulated events would have to occur. Nonetheless, this is an interesting problem b/c I have seen a couple widgets now assume that you are including their stuff via static script tags. It would be nice if they wrote code that was more useful to developers concerned about speed, but until then, is there a work around?? And/or are any of my assumptions wrong? Edit: Because the 1st answer to the question seemed to miss the point of my problem, I wanted to clarify the situation. This is about a specific problem. I'm not looking for yet another lazy load script or check if some dependencies are loaded script. Specifically this problem deals with external widgets that you do not have control over and may or may not be obfuscated delaying the load of the external widgets until they are needed or at least, til substantially after everything else has been loaded including other deferred elements b/c of the how the widget was written, precludes existing, typical lazy loading paradigms While it's esoteric, I have seen it happen with a couple widgets - where the widget developers assume that you're just willing to throw in another script tag at the bottom of the page. I'm looking to save those 500-1000 ms** though as numerous studies by yahoo, google, and amazon show it to be important to your user's experience. **My testing with hammerhead and personal experience indicates that this will be my savings in this case.

    Read the article

  • Is there a way to tell JVM to optimize my code before processing?

    - by Rogach
    I have a method, which takes much time to execute first time. But after several invocations, it takes about 30 times less time. So, to make my application respond to user interaction faster, I "warm-up" this method (5 times) with some sample data on initialization of application. But this increases app start-up time. I read, that JVM's can optimize and compile my java code to native, thus speeding things up. I wanted to know - maybe there is some way to explicitly tell JVM that I want this method to be compiled on startup of application?

    Read the article

< Previous Page | 103 104 105 106 107 108 109 110 111 112 113 114  | Next Page >