Search Results

Search found 3828 results on 154 pages for 'mathematical optimization'.

Page 56/154 | < Previous Page | 52 53 54 55 56 57 58 59 60 61 62 63  | Next Page >

  • Any ideas on How to search a 2D array quickly?

    - by Tattat
    I jave a 2D array like this, just like a matrix: {{1, 2, 4, 5, 3, 6}, {8, 3, 4, 4, 5, 2}, {8, 3, 4, 2, 6, 2}, //code skips... ... } I want to get all the "4" position, instead of searching the array one by way, and return the position, how can I search it faster / more efficient? thz in advance.

    Read the article

  • High performance text file parsing in .net

    - by diamandiev
    Here is the situation: I am making a small prog to parse server log files. I tested it with a log file with several thousand requests (between 10000 - 20000 don't know exactly) What i have to do is to load the log text files into memory so that i can query them. This is taking the most resources. The methods that take the most cpu time are those (worst culprits first): string.split - splits the line values into a array of values string.contains - checking if the user agent contains a specific agent string. (determine browser ID) string.tolower - various purposes streamreader.readline - to read the log file line by line. string.startswith - determine if line is a column definition line or a line with values there were some others that i was able to replace. For example the dictionary getter was taking lots of resources too. Which i had not expected since its a dictionary and should have its keys indexed. I replaced it with a multidimensional array and saved some cpu time. Now i am running on a fast dual core and the total time it takes to load the file i mentioned is about 1 sec. Now this is really bad. Imagine a site that has tens of thousands of visits a day. It's going to take minutes to load the log file. So what are my alternatives? If any, cause i think this is just a .net limitation and i can't do much about it.

    Read the article

  • Is a red-black tree my ideal data structure?

    - by Hugo van der Sanden
    I have a collection of items (big rationals) that I'll be processing. In each case, processing will consist of removing the smallest item in the collection, doing some work, and then adding 0-2 new items (which will always be larger than the removed item). The collection will be initialised with one item, and work will continue until it is empty. I'm not sure what size the collection is likely to reach, but I'd expect in the range 1M-100M items. I will not need to locate any item other than the smallest. I'm currently planning to use a red-black tree, possibly tweaked to keep a pointer to the smallest item. However I've never used one before, and I'm unsure whether my pattern of use fits its characteristics well. 1) Is there a danger the pattern of deletion from the left + random insertion will affect performance, eg by requiring a significantly higher number of rotations than random deletion would? Or will delete and insert operations still be O(log n) with this pattern of use? 2) Would some other data structure give me better performance, either because of the deletion pattern or taking advantage of the fact I only ever need to find the smallest item? Update: glad I asked, the binary heap is clearly a better solution for this case, and as promised turned out to be very easy to implement. Hugo

    Read the article

  • Optimizing Haskell code

    - by Masse
    I'm trying to learn Haskell and after an article in reddit about Markov text chains, I decided to implement Markov text generation first in Python and now in Haskell. However I noticed that my python implementation is way faster than the Haskell version, even Haskell is compiled to native code. I am wondering what I should do to make the Haskell code run faster and for now I believe it's so much slower because of using Data.Map instead of hashmaps, but I'm not sure I'll post the Python code and Haskell as well. With the same data, Python takes around 3 seconds and Haskell is closer to 16 seconds. It comes without saying that I'll take any constructive criticism :). import random import re import cPickle class Markov: def __init__(self, filenames): self.filenames = filenames self.cache = self.train(self.readfiles()) picklefd = open("dump", "w") cPickle.dump(self.cache, picklefd) picklefd.close() def train(self, text): splitted = re.findall(r"(\w+|[.!?',])", text) print "Total of %d splitted words" % (len(splitted)) cache = {} for i in xrange(len(splitted)-2): pair = (splitted[i], splitted[i+1]) followup = splitted[i+2] if pair in cache: if followup not in cache[pair]: cache[pair][followup] = 1 else: cache[pair][followup] += 1 else: cache[pair] = {followup: 1} return cache def readfiles(self): data = "" for filename in self.filenames: fd = open(filename) data += fd.read() fd.close() return data def concat(self, words): sentence = "" for word in words: if word in "'\",?!:;.": sentence = sentence[0:-1] + word + " " else: sentence += word + " " return sentence def pickword(self, words): temp = [(k, words[k]) for k in words] results = [] for (word, n) in temp: results.append(word) if n > 1: for i in xrange(n-1): results.append(word) return random.choice(results) def gentext(self, words): allwords = [k for k in self.cache] (first, second) = random.choice(filter(lambda (a,b): a.istitle(), [k for k in self.cache])) sentence = [first, second] while len(sentence) < words or sentence[-1] is not ".": current = (sentence[-2], sentence[-1]) if current in self.cache: followup = self.pickword(self.cache[current]) sentence.append(followup) else: print "Wasn't able to. Breaking" break print self.concat(sentence) Markov(["76.txt"]) -- module Markov ( train , fox ) where import Debug.Trace import qualified Data.Map as M import qualified System.Random as R import qualified Data.ByteString.Char8 as B type Database = M.Map (B.ByteString, B.ByteString) (M.Map B.ByteString Int) train :: [B.ByteString] -> Database train (x:y:[]) = M.empty train (x:y:z:xs) = let l = train (y:z:xs) in M.insertWith' (\new old -> M.insertWith' (+) z 1 old) (x, y) (M.singleton z 1) `seq` l main = do contents <- B.readFile "76.txt" print $ train $ B.words contents fox="The quick brown fox jumps over the brown fox who is slow jumps over the brown fox who is dead."

    Read the article

  • Lazy loading the addthis script? (or lazy loading external js content dependent on already fired eve

    - by Keith Bentrup
    I want to have the addthis widget available for my users, but I want to lazy load it so that my page loads as quickly as possible. However, after trying it via a script tag and then via my lazy loading method, it appears to only work via the script tag. In the obfuscated code, I see something that looks like it's dependent on the DOMContentLoaded event (at least for firefox). Since the DOMContentLoaded event has already fired, the widget doesn't render properly. What to do? I could just use a script tag (slower)... or could I fire (in a cross browser way) the DOMContentLoaded (or equivalent) event? I have a feeling this may not be possible b/c I believe that (like jQuery) there are multiple tests of the content ready event, and so multiple simulated events would have to occur. Nonetheless, this is an interesting problem b/c I have seen a couple widgets now assume that you are including their stuff via static script tags. It would be nice if they wrote code that was more useful to developers concerned about speed, but until then, is there a work around?? And/or are any of my assumptions wrong? Edit: Because the 1st answer to the question seemed to miss the point of my problem, I wanted to clarify the situation. This is about a specific problem. I'm not looking for yet another lazy load script or check if some dependencies are loaded script. Specifically this problem deals with external widgets that you do not have control over and may or may not be obfuscated delaying the load of the external widgets until they are needed or at least, til substantially after everything else has been loaded including other deferred elements b/c of the how the widget was written, precludes existing, typical lazy loading paradigms While it's esoteric, I have seen it happen with a couple widgets - where the widget developers assume that you're just willing to throw in another script tag at the bottom of the page. I'm looking to save those 500-1000 ms** though as numerous studies by yahoo, google, and amazon show it to be important to your user's experience. **My testing with hammerhead and personal experience indicates that this will be my savings in this case.

    Read the article

  • Is there a way to tell JVM to optimize my code before processing?

    - by Rogach
    I have a method, which takes much time to execute first time. But after several invocations, it takes about 30 times less time. So, to make my application respond to user interaction faster, I "warm-up" this method (5 times) with some sample data on initialization of application. But this increases app start-up time. I read, that JVM's can optimize and compile my java code to native, thus speeding things up. I wanted to know - maybe there is some way to explicitly tell JVM that I want this method to be compiled on startup of application?

    Read the article

  • WPF, how can I optimize lines and circles drawing ?

    - by Aurélien Ribon
    Hello ! I am developping an application where I need to draw a graph on the screen. For this purpose, I use a Canvas and I put Controls on it. An example of such a draw as shown in the app can be found here : http://free0.hiboox.com/images/1610/d82e0b7cc3521071ede601d3542c7bc5.png It works fine for simple graphs, but I also want to be able to draw very large graphs (hundreds of nodes). And when I try to draw a very large graph, it takes a LOT of time to render. My problem is that the code is not optimized at all, I just wanted it to work. Until now, I have a Canvas on the one hand, and multiple Controls on the other hands. Actually, circles and lines are listed in collections, and for each item of these collections, I use a ControlTemplate, defining a red circle, a black circle, a line, etc. Here is an example, the definition of a graph circle : <!-- STYLE : DISPLAY DATA NODE --> <Style TargetType="{x:Type flow.elements:DisplayNode}"> <Setter Property="Canvas.Left" Value="{Binding X, RelativeSource={RelativeSource Self}}" /> <Setter Property="Canvas.Top" Value="{Binding Y, RelativeSource={RelativeSource Self}}" /> <Setter Property="Template"> <Setter.Value> <ControlTemplate TargetType="{x:Type flow.elements:DisplayNode}"> <!--TEMPLATE--> <Grid x:Name="grid" Margin="-30,-30,0,0"> <Ellipse x:Name="selectionEllipse" StrokeThickness="0" Width="60" Height="60" Opacity="0" IsHitTestVisible="False"> <Ellipse.Fill> <RadialGradientBrush> <GradientStop Color="Black" Offset="0.398" /> <GradientStop Offset="1" /> </RadialGradientBrush> </Ellipse.Fill> </Ellipse> <Ellipse Stroke="Black" Width="30" Height="30" x:Name="ellipse"> <Ellipse.Fill> <LinearGradientBrush EndPoint="0,1"> <GradientStop Offset="0" Color="White" /> <GradientStop Offset="1.5" Color="LightGray" /> </LinearGradientBrush> </Ellipse.Fill> </Ellipse> <TextBlock x:Name="tblock" Text="{Binding NodeName, RelativeSource={RelativeSource Mode=TemplatedParent}}" Foreground="Black" VerticalAlignment="Center" HorizontalAlignment="Center" FontSize="10.667" /> </Grid> <!--TRIGGERS--> <ControlTemplate.Triggers> <!--DATAINPUT--> <MultiTrigger> <MultiTrigger.Conditions> <Condition Property="SkinMode" Value="NODETYPE" /> <Condition Property="NodeType" Value="DATAINPUT" /> </MultiTrigger.Conditions> <Setter TargetName="tblock" Property="Foreground" Value="White" /> <Setter TargetName="ellipse" Property="Fill"> <Setter.Value> <LinearGradientBrush EndPoint="0,1"> <GradientStop Offset="-0.5" Color="White" /> <GradientStop Offset="1" Color="Black" /> </LinearGradientBrush> </Setter.Value> </Setter> </MultiTrigger> <!--DATAOUTPUT--> <MultiTrigger> <MultiTrigger.Conditions> <Condition Property="SkinMode" Value="NODETYPE" /> <Condition Property="NodeType" Value="DATAOUTPUT" /> </MultiTrigger.Conditions> <Setter TargetName="tblock" Property="Foreground" Value="White" /> <Setter TargetName="ellipse" Property="Fill"> <Setter.Value> <LinearGradientBrush EndPoint="0,1"> <GradientStop Offset="-0.5" Color="White" /> <GradientStop Offset="1" Color="Black" /> </LinearGradientBrush> </Setter.Value> </Setter> </MultiTrigger> ....... THERE IS A TOTAL OF 7 MULTITRIGGERS ....... </ControlTemplate.Triggers> </ControlTemplate> </Setter.Value> </Setter> </Style> Also, the lines are drawn using the Line Control. <!-- STYLE : DISPLAY LINK --> <Style TargetType="{x:Type flow.elements:DisplayLink}"> <Setter Property="Template"> <Setter.Value> <ControlTemplate TargetType="{x:Type flow.elements:DisplayLink}"> <!--TEMPLATE--> <Line X1="{Binding X1, RelativeSource={RelativeSource TemplatedParent}}" X2="{Binding X2, RelativeSource={RelativeSource TemplatedParent}}" Y1="{Binding Y1, RelativeSource={RelativeSource TemplatedParent}}" Y2="{Binding Y2, RelativeSource={RelativeSource TemplatedParent}}" Stroke="Gray" StrokeThickness="2" x:Name="line" /> <!--TRIGGERS--> <ControlTemplate.Triggers> <!--BRANCH : ASSERTION--> <MultiTrigger> <MultiTrigger.Conditions> <Condition Property="SkinMode" Value="BRANCHTYPE" /> <Condition Property="BranchType" Value="ASSERTION" /> </MultiTrigger.Conditions> <Setter TargetName="line" Property="Stroke" Value="#E0E0E0" /> </MultiTrigger> </ControlTemplate.Triggers> </ControlTemplate> </Setter.Value> </Setter> </Style> So, I need your advices. How can I drastically improve the rendering performances ? Should I define each MultiTrigger circle rendering possibility in its own ControlTemplate instead ? Is there a better line drawing technique ? Should I open a DrawingContext and draw everything in one control, instead of having hundreds of controls ?

    Read the article

  • Improving the speed of php

    - by cast01
    I'm currently working on a website in PHP, and I'm wondering what the best practices/methods are to reduce the time requests take. I've build the site in a modular way, so a page would consist of a number of modules, and each of these would need to request information. For example, I have a cart module, that (if a cart is set) will fetch the cart with the id (stored in a session variable) from the database and return its contents. I have another module that lists categories and this needs to fetch the categories from the database. My system is built with models, and each model might also make a request, for example a category model will make a request to get products in that category.

    Read the article

  • How can I implement the Gale-Shapley stable marriage algorithm in Perl?

    - by srk
    Problem : We have equal number of men and women.each men has a preference score toward each woman. So do the woman for each man. each of the men and women have certain interests. Based on the interest we calculate the preference scores. So initially we have an input in a file having x columns. First column is the person(men/woman) id. id are nothing but 0.. n numbers.(first half are men and next half woman) the remaining x-1 columns will have the interests. these are integers too. now using this n by x-1 matrix... we have come up with a n by n/2 matrix. the new matrix has all men and woman as their rows and scores for opposite sex in columns. We have to sort the scores in descending order, also we need to know the id of person related to the scores after sorting. So here i wanted to use hash table. once we get the scores we need to make up pairs.. for which we need to follow some rules. My trouble is with the second matrix of n by n/2 that needs to give information of which man/woman has how much preference on a woman/man. I need these scores sorted so that i know who is the first preferred woman/man, 2nd preferred and so on for a man/woman. I hope to get good suggestions on the data structures i use.. I prefer php or perl. Thank you in advance Hey guys this is not an home work. This a little modified version of stable marriage algorithm. I have working solution. I am only working on optimizing my code. more info: It is very similar to stable marriage problem but here we need to calculate the scores based on the interests they share. So i have implemented it as the way you see in the wiki page http://en.wikipedia.org/wiki/Stable_marriage_problem. my problem is not solving the problem. i solved it and can run it. I am just trying to have a better solution. so i am asking suggestions on the type of data structure to use. Conceptually I tried using an array of hashes. where the array index give the person id and the hash in it gives the id's <= score's in sorted manner. I initially start with an array of hashes. now i sort the hashes on values, but i could not store the sorted hashes back in an array.So just stored the keys after sorting and used these to get the values from my initial unsorted hashes. Can we store the hashes after sorting ? Can you suggest a better structure ?

    Read the article

  • Custom View - Avoid redrawing when non-interactive

    - by MasterGaurav
    I have a complex custom view - photo collage. What is observed is whenever any UI interaction happens, the view is redrawn. How can I avoid complete redrawing (for example, use a cached UI) of the view specially when I click the "back" button to go back to previous activity because that also causes redrawing of the view. While exploring the API and web, I found a method - getDrawingCache() - but don't know how to use it effectively. How do I use it effectively? I've had other issues with Custom Views that I outline here.

    Read the article

  • Faster integer division when denominator is known?

    - by aaa
    hi I am working on GPU device which has very high division integer latency, several hundred cycles. I am looking to optimize divisions. All divisions by denominator which is in a set { 1,3,6,10 }, however numerator is a runtime positive value, roughly 32000 or less. due to memory constraints, lookup table is not option. Can you think of alternatives? I have thought of computing float point inverses, and using those to multiply numerator. Thanks

    Read the article

  • Is there a way to optimize this mysql query...?

    - by SpikETidE
    Hi Everyone... Say, I got these two tables.... Table 1 : Hotels hotel_id hotel_name 1 abc 2 xyz 3 efg Table 2 : Payments payment_id payment_date hotel_id total_amt comission p1 23-03-2010 1 100 10 p2 23-03-2010 2 50 5 p3 23-03-2010 2 200 25 p4 23-03-2010 1 40 2 Now, I need to get the following details from the two tables Given a particular date (say, 23-03-2010), the sum of the total_amt for each of the hotel for which a payment has been made on that particular date. All the rows that has the date 23-03-2010 ordered according to the hotel name A sample output is as follows... +------------+------------+------------+---------------+ | hotel_name | date | total_amt | commission | +------------+------------+------------+---------------+ | * abc | 23-03-2010 | 140 | 12 | +------------+------------+------------+---------------+ |+-----------+------------+------------+--------------+| || paymt_id | date | total_amt | commission || |+-----------+------------+------------+--------------+| || p1 | 23-03-2010 | 100 | 10 || |+-----------+------------+------------+--------------+| || p4 | 23-03-2010 | 40 | 2 || |+-----------+------------+------------+--------------+| +------------+------------+------------+---------------+ | * xyz | 23-03-2010 | 250 | 30 | +------------+------------+------------+---------------+ |+-----------+------------+------------+--------------+| || paymt_id | date | total_amt | commission || |+-----------+------------+------------+--------------+| || p2 | 23-03-2010 | 50 | 5 || |+-----------+------------+------------+--------------+| || p3 | 23-03-2010 | 200 | 25 || |+-----------+------------+------------+--------------+| +------------------------------------------------------+ Above the sample of the table that has to be printed... The idea is first to show the consolidated detail of each hotel, and when the '*' next to the hotel name is clicked the breakdown of the payment details will become visible... But that can be done by some jquery..!!! The table itself can be generated with php... Right now i am using two separate queries : One to get the sum of the amount and commission grouped by the hotel name. The next is to get the individual row for each entry having that date in the table. This is, of course, because grouping the records for calculating sum() returns only one row for each of the hotel with the sum of the amounts... Is there a way to combine these two queries into a single one and do the operation in a more optimized way...?? Hope i am being clear.. Thanks for your time and replies...

    Read the article

  • will a mysql query run slower if one of the tables involved has no index defined??

    - by lock
    there's this already populated database which came from another dev im not sure what went on that dev's mind when he created the tables, but on one of our scripts there is this query involving 4 tables and it runs super slow SELECT a.col_1, a.col_2, a.col_3, a.col_4, a.col_5, a.col_6, a.col_7 FROM a, b, c, d WHERE a.id = b.id AND b.c_id = c.id AND c.id = d.c_id AND a.col_8 = '$col_8' AND d.g_id = '$g_id' AND c.private = '1' NOTE: $col_8 and $g_id are variables from a form its only my theory that it's due to tables b and c not having an index, although im guessing that the dev didnt think that it was necessary since those tables only tell relations between a and d, where b tells that the data in a belongs to a certain user, and c tells that the user belongs to a group in d as you can see, there's not even a join or other extensive query functions used but this query which returns only around 100 rows takes 2 minutes to execute. anyway my question is simply this post's title. will a mysql query run slower if one of the tables involved has no index defined??

    Read the article

  • cheapest way to draw a fullscreen quad

    - by Soubok
    I wondering if there is a faster way to draw a full-screen quad in OpenGL: NewList(); PushMatrix(); LoadIdentity(); MatrixMode(PROJECTION); PushMatrix(); LoadIdentity(); Begin(QUADS); Vertex(-1,-1,0); Vertex(1,-1,0); Vertex(1,1,0); Vertex(-1,1,0); End(); PopMatrix(); MatrixMode(MODELVIEW); PopMatrix(); EndList();

    Read the article

  • Enforcing a query in MySql to use a specific index

    - by Hossein
    Hi, I have large table. consisting of only 3 columns (id(INT),bookmarkID(INT),tagID(INT)).I have two BTREE indexes one for each bookmarkID and tagID columns.This table has about 21 Million records. I am trying to run this query: SELECT bookmarkID,COUNT(bookmarkID) AS count FROM bookmark_tag_map GROUP BY tagID,bookmarkID HAVING tagID IN (-----"tagIDList"-----) AND count >= N which takes ages to return the results.I read somewhere that if make an index in which it has tagID,bookmarkID together, i will get a much faster result. I created the index after some time. Tried the query again, but it seems that this query is not using the new index that I have made.I ran EXPLAIN and saw that it is actually true. My question now is that how I can enforce a query to use a specific index? also comments on other ways to make the query faster are welcome. Thanks

    Read the article

  • When optimizing database queries, what exactly is the relationship between number of queries and siz

    - by williamjones
    To optimize application speed, everyone always advises to minimize the number of queries an application makes to the database, consolidating them into fewer queries that retrieve more wherever possible. However, this also always comes with the caution that data transferred is still data transferred, and just because you are making fewer queries doesn't make the data transferred free. I'm in a situation where I can over-include on the query in order to cut down the number of queries, and simply remove the unwanted data in the application code. Is there any type of a rule of thumb on how much of a cost there is to each query, to know when to optimize number of queries versus size of queries? I've tried to Google for objective performance analysis data, but surprisingly haven't been able to find anything like that. Clearly this relationship will change for factors such as when the database grows in size, making this somewhat individualized, but surely this is not so individualized that a broad sense of the landscape can't be drawn out? I'm looking for general answers, but for what it's worth, I'm running an application on Heroku.com, which means Ruby on Rails with a Postgres database.

    Read the article

  • PHP Hashtable array optimisation.

    - by hiprakhar
    I made a PHP app which was taking about ~0.0070sec for execution. Now, I added a hashtable array with about 2000 values. Suddenly the time for execution has gone up to ~0.0700 secs. Almost 10 times the previous value. I tried commenting out the part where I was searching inside the hashtable array (but array was still left defined). Still, the execution time remains about ~0.0500secs. Array is something like: $subjectinfo = array( 'TPT753' => 'Industrial Training', 'TPT801' => 'High Polymeric Engineering', 'TPT802' => 'Corrosion Engineering', 'TPT803' => 'Decorative ,Industrial And High Performance Coatings', 'TPT851' => 'Project'); Is there any way to optimize this part? I cannot use Database as I am running this app on Google app engine which is still not supporting JDO database for php. Some more code from the app: function getsubjectinfo($name) { $subjectinfo = array( 'TPT753' => 'Industrial Training', 'TPT801' => 'High Polymeric Engineering', 'TPT802' => 'Corrosion Engineering', 'TPT803' => 'Decorative ,Industrial And High Performance Coatings', 'TPT851' => 'Project'); $name = str_replace("-", "", $name); $name = str_replace(" ", "", $name); if (isset($subjectinfo["$name"])) return "(".$subjectinfo["$name"].")"; else return ""; } Then I am using the following statement 2-3 times in the app: echo $key." ".$this->getsubjectinfo($key)

    Read the article

  • Why is MySQL with InnoDB doing a table scan when key exists and choosing to examine 70 times more ro

    - by andysk
    Hello, I'm troubleshooting a query performance problem. Here's an expected query plan from explain: mysql> explain select * from table1 where tdcol between '2010-04-13:00:00' and '2010-04-14 03:16'; +----+-------------+--------------------+-------+---------------+--------------+---------+------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+--------------------+-------+---------------+--------------+---------+------+---------+-------------+ | 1 | SIMPLE | table1 | range | tdcol | tdcol | 8 | NULL | 5437848 | Using where | +----+-------------+--------------------+-------+---------------+--------------+---------+------+---------+-------------+ 1 row in set (0.00 sec) That makes sense, since the index named tdcol (KEY tdcol (tdcol)) is used, and about 5M rows should be selected from this query. However, if I query for just one more minute of data, we get this query plan: mysql> explain select * from table1 where tdcol between '2010-04-13 00:00' and '2010-04-14 03:17'; +----+-------------+--------------------+------+---------------+------+---------+------+-----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+--------------------+------+---------------+------+---------+------+-----------+-------------+ | 1 | SIMPLE | table1 | ALL | tdcol | NULL | NULL | NULL | 381601300 | Using where | +----+-------------+--------------------+------+---------------+------+---------+------+-----------+-------------+ 1 row in set (0.00 sec) The optimizer believes that the scan will be better, but it's over 70x more rows to examine, so I have a hard time believing that the table scan is better. Also, the 'USE KEY tdcol' syntax does not change the query plan. Thanks in advance for any help, and I'm more than happy to provide more info/answer questions.

    Read the article

  • Movies recommendation engine conceptual database design

    - by Supyxy
    I am working at an movie recommendations engine and i'm facing a DB design issue. My actual database looks like this: MOVIES [ID,TITLE] KEYWORDS_TABLE [ID,KEY_ID] - where ID is Foreign Key for MOVIES.id and KEY_ID is a key for a text keywords table This is not the entire DB, but i showed here what's important for my problem. I have about 50,000 movies and about 1,3 milion keywords correlations, and basically my algorithm consists in extracting all the who have the same keywords with a given movie, then ordering them by the number of keywords correlations. For example i looked for a movie similar to 'Cast away' and it returned 'Six days and six nights' because it had the most keywords correlations (4 keywords): Island Airplane crash Stranded Pilot The algorithm is based on more factors, but this one is the most important and the most difficult for the approach. Basically what i do now is getting all the movies that have at least one keyword similar to the given movie and then ordering them by other factors which are not important for a moment. There wouldn't be any problem if there weren't so many records, a query lasts in many cases up to 10-20 seconds and some of them return even over 5000 movies. Someone already helped me on here (thanks Mark Byers) with optimizing the query but that's not enough because it takes too longer SELECT DISTINCT M.title FROM keywords_table K1 JOIN keywords_table K2 ON K2.key_id = K1.key_id JOIN movies M ON K2.id = M.id WHERE K1.id = 4 So i thought it would be better if i pre-made those lists with movies recommendations for each movie, but i'm not sure how to design the tables.. whatever is it a good idea or how would you take this approach?

    Read the article

  • Strange: Planner takes decision with lower cost, but (very) query long runtime

    - by S38
    Facts: PGSQL 8.4.2, Linux I make use of table inheritance Each Table contains 3 million rows Indexes on joining columns are set Table statistics (analyze, vacuum analyze) are up-to-date Only used table is "node" with varios partitioned sub-tables Recursive query (pg = 8.4) Now here is the explained query: WITH RECURSIVE rows AS ( SELECT * FROM ( SELECT r.id, r.set, r.parent, r.masterid FROM d_storage.node_dataset r WHERE masterid = 3533933 ) q UNION ALL SELECT * FROM ( SELECT c.id, c.set, c.parent, r.masterid FROM rows r JOIN a_storage.node c ON c.parent = r.id ) q ) SELECT r.masterid, r.id AS nodeid FROM rows r QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------- CTE Scan on rows r (cost=2742105.92..2862119.94 rows=6000701 width=16) (actual time=0.033..172111.204 rows=4 loops=1) CTE rows -> Recursive Union (cost=0.00..2742105.92 rows=6000701 width=28) (actual time=0.029..172111.183 rows=4 loops=1) -> Index Scan using node_dataset_masterid on node_dataset r (cost=0.00..8.60 rows=1 width=28) (actual time=0.025..0.027 rows=1 loops=1) Index Cond: (masterid = 3533933) -> Hash Join (cost=0.33..262208.33 rows=600070 width=28) (actual time=40628.371..57370.361 rows=1 loops=3) Hash Cond: (c.parent = r.id) -> Append (cost=0.00..211202.04 rows=12001404 width=20) (actual time=0.011..46365.669 rows=12000004 loops=3) -> Seq Scan on node c (cost=0.00..24.00 rows=1400 width=20) (actual time=0.002..0.002 rows=0 loops=3) -> Seq Scan on node_dataset c (cost=0.00..55001.01 rows=3000001 width=20) (actual time=0.007..3426.593 rows=3000001 loops=3) -> Seq Scan on node_stammdaten c (cost=0.00..52059.01 rows=3000001 width=20) (actual time=0.008..9049.189 rows=3000001 loops=3) -> Seq Scan on node_stammdaten_adresse c (cost=0.00..52059.01 rows=3000001 width=20) (actual time=3.455..8381.725 rows=3000001 loops=3) -> Seq Scan on node_testdaten c (cost=0.00..52059.01 rows=3000001 width=20) (actual time=1.810..5259.178 rows=3000001 loops=3) -> Hash (cost=0.20..0.20 rows=10 width=16) (actual time=0.010..0.010 rows=1 loops=3) -> WorkTable Scan on rows r (cost=0.00..0.20 rows=10 width=16) (actual time=0.002..0.004 rows=1 loops=3) Total runtime: 172111.371 ms (16 rows) (END) So far so bad, the planner decides to choose hash joins (good) but no indexes (bad). Now after doing the following: SET enable_hashjoins TO false; The explained query looks like that: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- CTE Scan on rows r (cost=15198247.00..15318261.02 rows=6000701 width=16) (actual time=0.038..49.221 rows=4 loops=1) CTE rows -> Recursive Union (cost=0.00..15198247.00 rows=6000701 width=28) (actual time=0.032..49.201 rows=4 loops=1) -> Index Scan using node_dataset_masterid on node_dataset r (cost=0.00..8.60 rows=1 width=28) (actual time=0.028..0.031 rows=1 loops=1) Index Cond: (masterid = 3533933) -> Nested Loop (cost=0.00..1507822.44 rows=600070 width=28) (actual time=10.384..16.382 rows=1 loops=3) Join Filter: (r.id = c.parent) -> WorkTable Scan on rows r (cost=0.00..0.20 rows=10 width=16) (actual time=0.001..0.003 rows=1 loops=3) -> Append (cost=0.00..113264.67 rows=3001404 width=20) (actual time=8.546..12.268 rows=1 loops=4) -> Seq Scan on node c (cost=0.00..24.00 rows=1400 width=20) (actual time=0.001..0.001 rows=0 loops=4) -> Bitmap Heap Scan on node_dataset c (cost=58213.87..113214.88 rows=3000001 width=20) (actual time=1.906..1.906 rows=0 loops=4) Recheck Cond: (c.parent = r.id) -> Bitmap Index Scan on node_dataset_parent (cost=0.00..57463.87 rows=3000001 width=0) (actual time=1.903..1.903 rows=0 loops=4) Index Cond: (c.parent = r.id) -> Index Scan using node_stammdaten_parent on node_stammdaten c (cost=0.00..8.60 rows=1 width=20) (actual time=3.272..3.273 rows=0 loops=4) Index Cond: (c.parent = r.id) -> Index Scan using node_stammdaten_adresse_parent on node_stammdaten_adresse c (cost=0.00..8.60 rows=1 width=20) (actual time=4.333..4.333 rows=0 loops=4) Index Cond: (c.parent = r.id) -> Index Scan using node_testdaten_parent on node_testdaten c (cost=0.00..8.60 rows=1 width=20) (actual time=2.745..2.746 rows=0 loops=4) Index Cond: (c.parent = r.id) Total runtime: 49.349 ms (21 rows) (END) - incredibly faster, because indexes were used. Notice: Cost of the second query ist somewhat higher than for the first query. So the main question is: Why does the planner make the first decision, instead of the second? Also interesing: Via SET enable_seqscan TO false; i temp. disabled seq scans. Than the planner used indexes and hash joins, and the query still was slow. So the problem seems to be the hash join. Maybe someone can help in this confusing situation? thx, R.

    Read the article

  • Passing variables to functions

    - by Faken
    A quick question: When i pass a variable to a function, dose the program make a copy of that variable to use in the function? If it dose and I knew that the function would only read the variable and never write to it, is it possible to pass a variable to the function without creating a copy of that variable or should I just leave that up to the compiler optimizations to do that automatically for me?

    Read the article

  • Function-Local Static Const variable Initialization semantics.

    - by Hassan Syed
    The questions are in bold, for those that cannot be bothered reading a question in depth. This is a followup to this question. It is to do with the initialization semantics of static variables in functions. Static variables should be initialized once, and their internal state might be altered later - as I (currently) do in the linked question. However, the code in question does not require the feature to change the state of the variable later. Let me clarrify my position, since I don't require the string object's internal state to change. The code is for a trait class for meta programming, and as such would would benifit from a const char * const ptr -- thus Ideally a local cost static const variable is needed. My educated guess is that in this case the string in question will be optimally placed in memory by the link-loader, and that the code is more secure and maps to the intended semantics. This leads to the semantics of such a variable "The C++ Programming language Third Edition -- Stroustrup" does not have anything (that I could find) to say about this matter. All that is said is that the variable is initialized once when the flow of control of the thread first reaches the code. This leads me to ponder if the following code would be sensible, and if not what are the intended semantics ?. #include <iostream> const char * const GetString(const char * x_in) { static const char * const x = x_in; return x; } int main() { const char * const temp = GetString("yahoo"); std::cout << temp << std::endl; const char * const temp2 = GetString("yahoo2"); std::cout << temp2 << std::endl; } The following compiles on GCC and prints "yahoo" twice. Which is what I want -- However it might not be standards compliant (which is why I post this question). It might be more elegant to have two functions, "SetString" and "String" where the latter forwards to the first. If it is standards compliant does someone know of a templates implementation in boost (or elsewhere) ?

    Read the article

  • Optimizing spacing of mesh containing a given set of points

    - by Feynman
    I tried to summarize the this as best as possible in the title. I am writing an initial value problem solver in the most general way possible. I start with an arbitrary number of initial values at arbitrary locations (inside a boundary.) The first part of my program creates a mesh/grid (I am not sure which is the correct nuance), with N points total, that contains all the initial values. My goal is to optimize the mesh such that the spacing is as uniform as possible. My solver seems to work half decently (it needs some more obscure debugging that is not relevant here.) I am starting with one dimension. I intend to generalize the algorithm to an arbitrary number of dimensions once I get it working consistently. I am writing my code in fortran, but feel free to reply with pseudocode or the language of your choice. Allow me to elaborate with an example: Say I am working on a closed interval [1,10] xmin=1 xmax=10 Say I have 3 initial points: xmin, 5 and xmax num_ivc=3 known(num_ivc)=[xmin,5,xmax] //my arrays start at 1. Assume "known" starts sorted I store my mesh/grid points in an array called coord. Say I want 10 points total in my mesh/grid. N=10 coord(10) Remember, all this is arbitrary--except the variable names of course. The algorithm should set coord to {1,2,3,4,5,6,7,8,9,10} Now for a less trivial example: num_ivc=3 known(num_ivc)=[xmin,5.5,xmax or just num_ivc=1 known(num_ivc)=[5.5] Now, would you have 5 evenly spaced points on the interval [1, 5.5] and 5 evenly spaced points on the interval (5.5, 10]? But there is more space between 1 and 5.5 than between 5.5 and 10. So would you have 6 points on [1, 5.5] followed by 4 on (5.5 to 10]. The key is to minimize the difference in spacing. I have been working on this for 2 days straight and I can assure you it is a lot trickier than it sounds. I have written code that only works if N is large only works if N is small only works if it the known points are close together only works if it the known points are far apart only works if at least one of the known points is near a boundary only works if none of the known points are near a boundary So as you can see, I have coded the gamut of almost-solutions. I cannot figure out a way to get it to perform equally well in all possible scenarios (that is, create the optimum spacing.)

    Read the article

  • Optimizing WordWrap Algorithm

    - by Milo
    I have a word-wrap algorithm that basically generates lines of text that fit the width of the text. Unfortunately, it gets slow when I add too much text. I was wondering if I oversaw any major optimizations that could be made. Also, if anyone has a design that would still allow strings of lines or string pointers of lines that is better I'd be open to rewriting the algorithm. Thanks void AguiTextBox::makeLinesFromWordWrap() { textRows.clear(); textRows.push_back(""); std::string curStr; std::string curWord; int curWordWidth = 0; int curLetterWidth = 0; int curLineWidth = 0; bool isVscroll = isVScrollNeeded(); int voffset = 0; if(isVscroll) { voffset = pChildVScroll->getWidth(); } int AdjWidthMinusVoffset = getAdjustedWidth() - voffset; int len = getTextLength(); int bytesSkipped = 0; int letterLength = 0; size_t ind = 0; for(int i = 0; i < len; ++i) { //get the unicode character letterLength = _unicodeFunctions.bringToNextUnichar(ind,getText()); curStr = getText().substr(bytesSkipped,letterLength); bytesSkipped += letterLength; curLetterWidth = getFont().getTextWidth(curStr); //push a new line if(curStr[0] == '\n') { textRows.back() += curWord; curWord = ""; curLetterWidth = 0; curWordWidth = 0; curLineWidth = 0; textRows.push_back(""); continue; } //ensure word is not longer than the width if(curWordWidth + curLetterWidth >= AdjWidthMinusVoffset && curWord.length() >= 1) { textRows.back() += curWord; textRows.push_back(""); curWord = ""; curWordWidth = 0; curLineWidth = 0; } //add letter to word curWord += curStr; curWordWidth += curLetterWidth; //if we need a Vscroll bar start over if(!isVscroll && isVScrollNeeded()) { isVscroll = true; voffset = pChildVScroll->getWidth(); AdjWidthMinusVoffset = getAdjustedWidth() - voffset; i = -1; curWord = ""; curStr = ""; textRows.clear(); textRows.push_back(""); ind = 0; curWordWidth = 0; curLetterWidth = 0; curLineWidth = 0; bytesSkipped = 0; continue; } if(curLineWidth + curWordWidth >= AdjWidthMinusVoffset && textRows.back().length() >= 1) { textRows.push_back(""); curLineWidth = 0; } if(curStr[0] == ' ' || curStr[0] == '-') { textRows.back() += curWord; curLineWidth += curWordWidth; curWord = ""; curWordWidth = 0; } } if(curWord != "") { textRows.back() += curWord; } updateWidestLine(); }

    Read the article

< Previous Page | 52 53 54 55 56 57 58 59 60 61 62 63  | Next Page >