Search Results

Search found 1631 results on 66 pages for 'optimize'.

Page 53/66 | < Previous Page | 49 50 51 52 53 54 55 56 57 58 59 60 | Next Page >

Which technology is best suited to store and query a huge readonly graph?

- by asmaier

I have a huge directed graph: It consists of 1.6 million nodes and 30 million edges. I want the users to be able to find all the shortest connections (including incoming and outgoing edges) between two nodes of the graph (via a web interface). At the moment I have stored the graph in a PostgreSQL database. But that solution is not very efficient and elegant, I basically need to store all the edges of the graph twice (see my question PostgreSQL: How to optimize my database for storing and querying a huge graph). It was suggested to me to use a GraphDB like neo4j or AllegroGraph. However the free version of AllegroGraph is limited to 50 million nodes and also has a very high-level API (RDF), which seems too powerful and complex for my problem. Neo4j on the other hand has only a very low level API (and the python interface is not mature yet). Both of them seem to be more suited for problems, where nodes and edges are frequently added or removed to a graph. For a simple search on a graph, these GraphDBs seem to be too complex. One idea I had would be to "misuse" a search engine like Lucene for the job, since I'm basically only searching connections in a graph. Another idea would be, to have a server process, storing the whole graph (500MB to 1GB) in memory. The clients could then query the server process and could transverse the graph very quickly, since the graph is stored in memory. Is there an easy possibility to write such a server (preferably in Python) using some existing framework? Which technology would you use to store and query such a huge readonly graph?

Read the article
Codechef practice question help needed - find trailing zeros in a factorial

- by manugupt1

I have been working on this for 24 hours now, trying to optimize it. The question is how to find the number of trailing zeroes in factorial of a number in range of 10000000 and 10 million test cases in about 8 secs. The code is as follows: #include<iostream> using namespace std; int count5(int a){ int b=0; for(int i=a;i>0;i=i/5){ if(i%15625==0){ b=b+6; i=i/15625; } if(i%3125==0){ b=b+5; i=i/3125; } if(i%625==0){ b=b+4; i=i/625; } if(i%125==0){ b=b+3; i=i/125; } if(i%25==0){ b=b+2; i=i/25; } if(i%5==0){ b++; } else break; } return b; } int main(){ int l; int n=0; cin>>l; //no of test cases taken as input int *T = new int[l]; for(int i=0;i<l;i++) cin>>T[i]; //nos taken as input for the same no of test cases for(int i=0;i<l;i++){ n=0; for(int j=5;j<=T[i];j=j+5){ n+=count5(j); //no of trailing zeroes calculted } cout<<n<<endl; //no for each trialing zero printed } delete []T; } Please help me by suggesting a new approach, or suggesting some modifications to this one.

Read the article
Query optimization (OR based)

- by john194

I have googled but I can't find answers for these questions. Your advice is appreciated. centOS on vps with 512MB RAM, nginx, php5 (fastcgi), mysql5 (myisam, not innodb). I need to optimize this app created by some ex-employee. This app is working, but it's slow. Table: t1(id[bigint(20)],c1[mediumtext],c2[mediumtext],c3[mediumtext],c4[mediumtext]) id is some random big number, and is PK Those mediumtext rows look like this: c1="|box-002877|" c2="|ct-2348|rd-11124854|hw-3949|wd-8872|hw-119037736|...etc.. " c3="|fg-2448|wd-11172|hw-1656|...etc.. " c4="|hg-2448|qd-16667|...etc." (some columns contain a lot of data, around 900 KiB, database around 300 MiB) Yes, mediumtext "is bad", and (20) is too big... but I didn't create this. Those codes can be found on any of those 4 mediumtext's... //he needs all the columns of the row containing $code, so he wrote this: function f1($code) { SELECT * FROM t1 WHERE c1 LIKE '%$code%' OR c2 LIKE '%$code%' OR c3 LIKE '%$code%' OR c4 LIKE '%$code%'; Questions: Q1. If $code is found on c1... mysql automatically stops checking and returns row=id+c1+c2+c3+c4? or it will continue (wasting time) checking c2, c3 and c4?... Q2. Mysql is working with this table on disk (not RAM) because of the mediumtext, right? is this the primary cause of slowness? Q3. That query can be cached by mysql (if using a big query_cache_size=128M value on the my.cnf)? or that's not cacheable due to the mediumtexts, or due to the "OR LIKE"...? Q4. Do you recommend rewriting this with mysql's INSTR() / LOCATE() / MATCH..AGAINST [FULLTEXT]?

Read the article
SQL Where Clause Against View

- by Adam Carr

I have a view (actually, it's a table valued function, but the observed behavior is the same in both) that inner joins and left outer joins several other tables. When I query this view with a where clause similar to SELECT * FROM [v_MyView] WHERE [Name] like '%Doe, John%' ... the query is very slow, but if I do the following... SELECT * FROM [v_MyView] WHERE [ID] in ( SELECT [ID] FROM [v_MyView] WHERE [Name] like '%Doe, John%' ) it is MUCH faster. The first query is taking at least 2 minutes to return, if not longer where the second query will return in less than 5 seconds. Any suggestions on how I can improve this? If I run the whole command as one SQL statement (without the use of a view) it is very fast as well. I believe this result is because of how a view should behave as a table in that if a view has OUTER JOINS, GROUP BYS or TOP ##, if the where clause was interpreted prior to vs after the execution of the view, the results could differ. My question is why wouldn't SQL optimize my first query to something as efficient as my second query?

Read the article
php: showing my country based on my IP, mysql optimized

- by andufo

I'm downloaded WIPmania's worldip table from http://www.wipmania.com/en/base/ -- the table has 3 fields and around 79k rows: startip // example: 3363110912 endip // example: 3363112063 country // example: AR (Argentina) So, lets suppose i'm in Argentina and my IP address is: 200.117.248.17 1) I use this function to convert my ip to long function ip_address_to_number($ip) { if(!$ip) { return false; } else { $ip = split('\.',$ip); return($ip[0]*16777216 + $ip[1]*65536 + $ip[2]*256 + $ip[3]); } } 2) I search for the proper country code by matching the long converted ip: $sql = 'SELECT * FROM worldip WHERE '.ip_address_to_number($_SERVER['REMOTE_ADDR']).' BETWEEN startip AND endip'; which is equivalent to: SELECT country FROM worldip WHERE 3363174417 BETWEEN startip AND endip (benchmark: Showing rows 0 - 0 (1 total, Query took 0.2109 sec)) Now comes the real question. What if another bunch of argentinian guys also open the website and they all have these ip addresses: 200.117.248.17 200.117.233.10 200.117.241.88 200.117.159.24 Since i'm caching all the sql queries; instead of matching EACH of the ip queries in the database, would it be better (and right) just to match the 2 first sections of the ip by modifying the function like this? function ip_address_to_number($ip) { if(!$ip) { return false; } else { $ip = split('\.',$ip); return($ip[0]*16777216 + $ip[1]*65536); } } (notice that the 3rd and 4th splitted values of the IP have been removed). That way instead of querying these 4 values: 3363174417 3363170570 3363172696 3363151640 ...all i have to query is: 3363110912 (which is 200.117.0.0 converted to long). Is this right? any other ideas to optimize this process? Thanks!

Read the article
compact Number formatting behavior in Java (automatically switch between decimal and scientific notation)

- by kostmo

I am looking for a way to format a floating point number dynamically in either standard decimal format or scientific notation, depending on the value of the number. For moderate magnitudes, the number should be formatted as a decimal with trailing zeros suppressed. If the floating point number is equal to an integral value, the decimal point should also be suppressed. For extreme magnitudes (very small or very large), the number should be expressed in scientific notation. Alternately stated, if the number of characters in the expression as standard decimal notation exceeds a certain threshold, switch to scientific notation. I should have control over the maximum number of digits of precision, but I don't want trailing zeros appended to express the minimum precision; all trailing zeros should be suppressed. Basically, it should optimize for compactness and readability. 2.80000 - 2.8 765.000000 - 765 0.0073943162953 - 0.00739432 (limit digits of precision—to 6 in this case) 0.0000073943162953 - 7.39432E-6 (switch to scientific notation if the magnitude is small enough—less than 1E-5 in this case) 7394316295300000 - 7.39432E+6 (switch to scientific notation if the magnitude is large enough—for example, when greater than 1E+10) 0.0000073900000000 - 7.39E-6 (strip trailing zeros from significand in scientific notation) 0.000007299998344 - 7.3E-6 (rounding from the 6-digit precision limit causes this number to have trailing zeros which are stripped) Here's what I've found so far: The .toString() method of the Number class does most of what I want, except it doesn't upconvert to integer representation when possible, and it will not express large integral magnitudes in scientific notation. Also, I'm not sure how to adjust the precision. The "%G" format string to the String.format(...) function allows me to express numbers in scientific notation with adjustable precision, but does not strip trailing zeros. I'm wondering if there's already some library function out there that meets these criteria. I guess the only stumbling block for writing this myself is having to strip the trailing zeros from the significand in scientific notation produced by %G.

Read the article
Fast JSON serialization (and comparison with Pickle) for cluster computing in Python?

- by user248237

I have a set of data points, each described by a dictionary. The processing of each data point is independent and I submit each one as a separate job to a cluster. Each data point has a unique name, and my cluster submission wrapper simply calls a script that takes a data point's name and a file describing all the data points. That script then accesses the data point from the file and performs the computation. Since each job has to load the set of all points only to retrieve the point to be run, I wanted to optimize this step by serializing the file describing the set of points into an easily retrievable format. I tried using JSONpickle, using the following method, to serialize a dictionary describing all the data points to file: def json_serialize(obj, filename, use_jsonpickle=True): f = open(filename, 'w') if use_jsonpickle: import jsonpickle json_obj = jsonpickle.encode(obj) f.write(json_obj) else: simplejson.dump(obj, f, indent=1) f.close() The dictionary contains very simple objects (lists, strings, floats, etc.) and has a total of 54,000 keys. The json file is ~20 Megabytes in size. It takes ~20 seconds to load this file into memory, which seems very slow to me. I switched to using pickle with the same exact object, and found that it generates a file that's about 7.8 megabytes in size, and can be loaded in ~1-2 seconds. This is a significant improvement, but it still seems like loading of a small object (less than 100,000 entries) should be faster. Aside from that, pickle is not human readable, which was the big advantage of JSON for me. Is there a way to use JSON to get similar or better speed ups? If not, do you have other ideas on structuring this? (Is the right solution to simply "slice" the file describing each event into a separate file and pass that on to the script that runs a data point in a cluster job? It seems like that could lead to a proliferation of files). thanks.

Read the article
Should I move big data blobs in JSON or in separate binary connection?

- by Amagrammer

QUESTION: Is it better to send large data blobs in JSON for simplicity, or send them as binary data over a separate connection? If the former, can you offer tips on how to optimize the JSON to minimize size? If the latter, is it worth it to logically connect the JSON data to the binary data using an identifier that appears in both, e.g., as "data" : "< unique identifier " in the JSON and with the first bytes of the data blob being < unique identifier ? CONTEXT: My iPhone application needs to receive JSON data over the 3G network. This means that I need to think seriously about efficiency of data transfer, as well as the load on the CPU. Most of the data transfers will be relatively small packets of text data for which JSON is a natural format and for which there is no point in worrying much about efficiency. However, some of the most critical transfers will be big blobs of binary data -- definitely at least 100 kilobytes of data, and possibly closer to 1 megabyte as customers accumulate a longer history with the product. (Note: I will be caching what I can on the iPhone itself, but the data still has to be transferred at least once.) It is NOT streaming data. I will probably use a third-party JSON SDK -- the one I am using during development is here. Thanks

Read the article
Datastore performance, my code or the datastore latency

- by fredrik

I had for the last month a bit of a problem with a quite basic datastore query. It involves 2 db.Models with one referring to the other with a db.ReferenceProperty. The problem is that according to the admin logs the request takes about 2-4 seconds to complete. I strip it down to a bare form and a list to display the results. The put works fine, but the get accumulates (in my opinion) way to much cpu time. #The get look like this: outputData['items'] = {} labelsData = Label.all() for label in labelsData: labelItem = label.item.name if labelItem not in outputData['items']: outputData['items'][labelItem] = { 'item' : labelItem, 'labels' : [] } outputData['items'][labelItem]['labels'].append(label.text) path = os.path.join(os.path.dirname(__file__), 'index.html') self.response.out.write(template.render(path, outputData)) #And the models: class Item(db.Model): name = db.StringProperty() class Label(db.Model): text = db.StringProperty() lang = db.StringProperty() item = db.ReferenceProperty(Item) I've tried to make it a number of different way ie. instead of ReferenceProperty storing all Label keys in the Item Model as a db.ListProperty. My test data is just 10 rows in Item and 40 in Label. So my questions: Is it a fools errand to try to optimize this since the high cpu usage is due to the problems with the datastore or have I just screwed up somewhere in the code? ..fredrik

Read the article
Any suggestions for good automated web load testing tool?

- by fmunkert

What are some good automated tools for load testing (stress testing) web applications, that do not use record and replay of HTTP network packets? I am aware that there are numerous load testing tools on the market that record and replay HTTP network packets. But these are unsuitable for my purpose, because of this: The HTTP packet format changes very often in our application (e.g. when we optimize an AJAX call). We do not want to adapt all test scripts just because there is a slight change in HTTP packet format. Our test team shall not need to know any internals about our application to write their test scripts. A tool that replays HTTP packets, however, requires the team to know the format of HTTP requests and responses, such that they can adapt details of the replayed HTTP packets (e.g. user name). The automated load testing tool I am looking for should be able to let the test team write "black box" test scripts such as: Invoke web page at URL http://... . First, enter XXX into text field XXX. Then, press button XXX. Wait until response has been received from web server. Verify that text field XXX now contains the text XXX. The tool should be able to simulate up to several 1000 users, and it should be compatible with web applications using ASP.NET and AJAX.

Read the article
Javascript scope chain

- by Geromey

Hi, I am trying to optimize my program. I think I understand the basics of closure. I am confused about the scope chain though. I know that in general you want a low scope (to access variables quickly). Say I have the following object: var my_object = (function(){ //private variables var a_private = 0; return{ //public //public variables a_public : 1, //public methods some_public : function(){ debugger; alert(this.a_public); alert(a_private); }; }; })(); My understanding is that if I am in the some_public method I can access the private variables faster than the public ones. Is this correct? My confusion comes with the scope level of this. When the code is stopped at debugger, firebug shows the public variable inside the this keyword. The this word is not inside a scope level. How fast is accessing this? Right now I am storing any this.properties as another local variable to avoid accessing it multiple times. Thanks very much!

Read the article
What computer science topic am I trying to describe?

- by ItzWarty

I've been programming for around... 6-8 years, and I've begun to realize that I don't really know what really happens at the low-ish level when I do something like int i = j%348 The thing is, I know what j%348 does, it divides j by 348 and finds the remainder. What I don't know is HOW the computer does this. Similarly, I know that try { blah(); }catch(Exception e){ blah2(); } will invoke blah and if blah throws, it will invoke blah2... however, I have no idea how the computer does this instead of err... crashing or ending execution. And I figure that in order for me to get "better" at programming, I should probably know what my code is really doing. [This would probably also help me optimize and... err... not do stupid things] I figure that what I'm asking for is probably something huge taught in universities or something, but to be honest, if I could learn a little, I would be happy. The point of the question is: What topic/computer-science-course am I asking about? Because in all honesty, I don't know. Since I don't know what the topic is called, I'm unable to actually find a book or online resource to learn about the topic, so I'm sort of stuck. I'd be eternally thankful if someone helped me =/

Read the article
SQL: Speed Improvement - Cluttered union query

- by vol7ron

SELECT * FROM ( SELECT a.user_id, a.f_name, a.l_name, b.user_id, b.f_name, b.l_name FROM current_tbl a INNER JOIN import_tbl b ON ( a.user_id = b.user_id ) UNION SELECT a.user_id, a.f_name, a.l_name, b.user_id, b.f_name, b.l_name FROM current_tbl a INNER JOIN import_tbl b ON ( lower(a.f_name)=lower(b.f_name) AND lower(a.l_name)=lower(b.l_name) ) ) foo -- UNION -- SELECT a.user_id , a.f_name , a.l_name , '' , '' , '' FROM current_tbl a WHERE a.user_id NOT IN ( select user_id from( SELECT a.user_id, a.f_name, a.l_name, b.user_id, b.f_name, b.l_name FROM current_tbl a INNER JOIN import_tbl b ON ( a.user_id = b.user_id ) UNION SELECT a.user_id, a.f_name, a.l_name, b.user_id, b.f_name, b.l_name FROM current_tbl a INNER JOIN import_tbl b ON ( lower(a.f_name)=lower(b.f_name) AND lower(a.l_name)=lower(b.l_name) ) ) bar ) ORDER BY user_id Example of table population: current_tbl: ------------------------------- user_id | f_name | l_name ---------+----------+---------- A1 | Adam | Acorn A2 | Beth | Berry A3 | Calv | Chard | | import_tbl: ------------------------------- user_id | f_name | l_name ---------+----------+---------- A1 | Adam | Acorn A2 | Beth | Butcher <- last_name different | | Expected Output: ----------------------------------------------------------------------- user_id1 | f_name1 | l_name1 | user_id2 | f_name2 | l_name2 ----------+-----------+-----------+------------+-----------+----------- A1 | Adam | Acorn | A1 | Adam | Acorn A2 | Beth | Berry | A2 | Beth | Butcher A3 | Calv | Chard | | | Doing this method gets rid of conditions where the row would be: A2 | Beth | Berry | A2 | Beth | Butcher But it keeps the A3 row I hope this makes sense and I haven't overly simplified it. This is a continuation question from my other question. The succession of these improvements has dropped the query down from ~32000ms to where it's at now ~1200ms - quite an improvement. I supect I can optimize by using UNION ALL in the subquery and of course the usual index optimizations, but I'm looking for the best SQL optimization. FYI this particular case is for PostgreSQL.

Read the article
In .NET, What Is Fastest Way to Initialize Multi-Dimensional Array to Non-Default Value

- by AMissico

How do I initialize a multi-dimensional array of a primitive type as fast as possible? I am stuck with using multi-dimensional arrays. My problem is performance. The following routine initializes a 100x100 array in approx. 500 ticks. Removing the int.MaxValue initialization results in approx. 180 ticks just for the looping. Approximately 100 ticks to create the array without looping and without initializing to int.MaxValue. Routines similiar to this are called a few tens-of-thousands to several million times. I am open to suggestions on how to optimize this non-default initialization of an array. One idea I had is to use a smaller primitive type when available. For instance, using byte instead of int, saves 100 ticks. I would be happy with this, but I am hoping that I don't have to change the primitive data type. public int[,] CreateArray(Size size) { int[,] array = new int[size.Width, size.Height]; for (int x = 0; x < size.Width; x++) { for (int y = 0; y < size.Height; y++) { array[x, y] = int.MaxValue; } } return array; }

Read the article
Most efficient way to check for DBNull and then assign to a variable?

- by ilitirit

This question comes up occasionally but I haven't seen a satisfactory answer. A typical pattern is (row is a DataRow): if (row["value"] != DBNull.Value) { someObject.Member = row["value"]; } My first question is which is more efficient (I've flipped the condition): row["value"] == DBNull.Value; // Or row["value"] is DBNull; // Or row["value"].GetType() == typeof(DBNull) // Or... any suggestions? This indicates that .GetType() should be faster, but maybe the compiler knows a few tricks I don't? Second question, is it worth caching the value of row["value"] or does the compiler optimize the indexer away anyway? eg. object valueHolder; if (DBNull.Value == (valueHolder = row["value"])) {} Disclaimers: row["value"] exists. I don't know the column index of the column (hence the column name lookup) I'm asking specifically about checking for DBNull and then assignment (not about premature optimization etc). Edit: I benchmarked a few scenarios (time in seconds, 10000000 trials): row["value"] == DBNull.Value: 00:00:01.5478995 row["value"] is DBNull: 00:00:01.6306578 row["value"].GetType() == typeof(DBNull): 00:00:02.0138757 Object.ReferenceEquals has the same performance as "==" The most interesting result? If you mismatch the name of the column by case (eg. "Value" instead of "value", it takes roughly ten times longer (for a string): row["Value"] == DBNull.Value: 00:00:12.2792374 The moral of the story seems to be that if you can't look up a column by it's index, then ensure that the column name you feed to the indexer matches the DataColumn's name exactly. Caching the value also appears to be nearly twice as fast: No Caching: 00:00:03.0996622 With Caching: 00:00:01.5659920 So the most efficient method seems to be: object temp; string variable; if (DBNull.Value != (temp = row["value"]) { variable = temp.ToString(); } This was a good learning experience.

Read the article
Lucene and Special Characters

- by Brandon

I am using Lucene.Net 2.0 to index some fields from a database table. One of the fields is a 'Name' field which allows special characters. When I perform a search, it does not find my document that contains a term with special characters. I index my field as such: Directory DALDirectory = FSDirectory.GetDirectory(@"C:\Indexes\Name", false); Analyzer analyzer = new StandardAnalyzer(); IndexWriter indexWriter = new IndexWriter(DALDirectory, analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED); Document doc = new Document(); doc.Add(new Field("Name", "Test (Test)", Field.Store.YES, Field.Index.TOKENIZED)); indexWriter.AddDocument(doc); indexWriter.Optimize(); indexWriter.Close(); And I search doing the following: value = value.Trim().ToLower(); value = QueryParser.Escape(value); Query searchQuery = new TermQuery(new Term(field, value)); Searcher searcher = new IndexSearcher(DALDirectory); TopDocCollector collector = new TopDocCollector(searcher.MaxDoc()); searcher.Search(searchQuery, collector); ScoreDoc[] hits = collector.TopDocs().scoreDocs; If I perform a search for field as 'Name' and value as 'Test', it finds the document. If I perform the same search as 'Name' and value as 'Test (Test)', then it does not find the document. Even more strange, if I remove the QueryParser.Escape line do a search for a GUID (which, of course, contains hyphens) it finds documents where the GUID value matches, but performing the same search with the value as 'Test (Test)' still yields no results. I am unsure what I am doing wrong. I am using the QueryParser.Escape method to escape the special characters and am storing the field and searching by the Lucene.Net's examples. Any thoughts?

Read the article
UIView: how to do non-destructive drawing?

- by Caffeine Coma

My original question: I'm creating a simple drawing application and need to be able to draw over existing, previously drawn content in my drawRect. What is the proper way to draw on top of existing content without entirely replacing it? Based on answers received here and elsewhere, here is the deal. You should be prepared to redraw the entire rectangle whenever drawRect is called. You cannot prevent the contents from being erased by doing the following: [self setClearsContextBeforeDrawing: NO]; This is merely a hint to the graphics engine that there is no point in having it pre-clear the view for you, since you will likely need to re-draw the whole area anyway. It may prevent your view from being automatically erased, but you cannot depend on it. To draw on top of your view without erasing, do your drawing to an off-screen bitmap context (which is never cleared by the system.) Then in your drawRect, copy from this off-screen buffer to the view. Example: - (id) initWithCoder: (NSCoder*) coder { if (self = [super initWithCoder: coder]) { self.backgroundColor = [UIColor clearColor]; CGSize size = self.frame.size; drawingContext = [self createDrawingBufferContext: size]; } return self; } - (CGContextRef) createOffscreenContext: (CGSize) size { CGColorSpaceRef colorSpace = CGColorSpaceCreateDeviceRGB(); CGContextRef context = CGBitmapContextCreate(NULL, size.width, size.height, 8, size.width*4, colorSpace, kCGImageAlphaPremultipliedLast); CGColorSpaceRelease(colorSpace); CGContextTranslateCTM(context, 0, size.height); CGContextScaleCTM(context, 1.0, -1.0); return context; } - (void)drawRect:(CGRect) rect { UIGraphicsPushContext(drawingContext); CGImageRef cgImage = CGBitmapContextCreateImage(drawingContext); UIImage *uiImage = [[UIImage alloc] initWithCGImage:cgImage]; UIGraphicsPopContext(); CGImageRelease(cgImage); [uiImage drawInRect: rect]; [uiImage release]; } TODO: can anyone optimize the drawRect so that only the (usually tiny) modified rectangle region is used for the copy?

Read the article
Shouldn't prepared statements be much faster?

- by silversky

$s = explode (" ", microtime()); $s = $s[0]+$s[1]; $con = mysqli_connect ('localhost', 'test', 'pass', 'db') or die('Err'); for ($i=0; $i<1000; $i++) { $stmt = $con -> prepare( " SELECT MAX(id) AS max_id , MIN(id) AS min_id FROM tb "); $stmt -> execute(); $stmt->bind_result($M,$m); $stmt->free_result(); $rand = mt_rand( $m , $M ).'<br/>'; $res = $con -> prepare( " SELECT * FROM tb WHERE id >= ? LIMIT 0,1 "); $res -> bind_param("s", $rand); $res -> execute(); $res->free_result(); } $e = explode (" ", microtime()); $e = $e[0]+$e[1]; echo number_format($e-$s, 4, '.', ''); // and: $link = mysql_connect ("localhost", "test", "pass") or die (); mysql_select_db ("db") or die ("Unable to select database".mysql_error()); for ($i=0; $i<1000; $i++) { $range_result = mysql_query( " SELECT MAX(`id`) AS max_id , MIN(`id`) AS min_id FROM tb "); $range_row = mysql_fetch_object( $range_result ); $random = mt_rand( $range_row->min_id , $range_row->max_id ); $result = mysql_query( " SELECT * FROM tb WHERE id >= $random LIMIT 0,1 "); } defenitly prepared statements are much more safer but also every where it says that they are much faster BUT in my test on the above code I have: - 2.45 sec for prepared statements - 5.05 sec for the secon example What do you think I'm doing wrong? Should I use the second solution or I should try to optimize the prep stmt?

Read the article
PHP Hashtable array optimisation.

- by hiprakhar

I made a PHP app which was taking about ~0.0070sec for execution. Now, I added a hashtable array with about 2000 values. Suddenly the time for execution has gone up to ~0.0700 secs. Almost 10 times the previous value. I tried commenting out the part where I was searching inside the hashtable array (but array was still left defined). Still, the execution time remains about ~0.0500secs. Array is something like: $subjectinfo = array( 'TPT753' => 'Industrial Training', 'TPT801' => 'High Polymeric Engineering', 'TPT802' => 'Corrosion Engineering', 'TPT803' => 'Decorative ,Industrial And High Performance Coatings', 'TPT851' => 'Project'); Is there any way to optimize this part? I cannot use Database as I am running this app on Google app engine which is still not supporting JDO database for php. Some more code from the app: function getsubjectinfo($name) { $subjectinfo = array( 'TPT753' => 'Industrial Training', 'TPT801' => 'High Polymeric Engineering', 'TPT802' => 'Corrosion Engineering', 'TPT803' => 'Decorative ,Industrial And High Performance Coatings', 'TPT851' => 'Project'); $name = str_replace("-", "", $name); $name = str_replace(" ", "", $name); if (isset($subjectinfo["$name"])) return "(".$subjectinfo["$name"].")"; else return ""; } Then I am using the following statement 2-3 times in the app: echo $key." ".$this->getsubjectinfo($key)

Read the article
Design suggestion for expression tree evaluation with time-series data

- by Lirik

I have a (C#) genetic program that uses financial time-series data and it's currently working but I want to re-design the architecture to be more robust. My main goals are: sequentially present the time-series data to the expression trees. allow expression trees to access previous data rows when needed. to optimize performance of the data access while evaluating the expression trees. keep a common interface so various types of data can be used. Here are the possible approaches I've thought about: I can evaluate the expression tree by passing in a data row into the root node and let each child node use the same data row. I can evaluate the expression tree by passing in the data row index and letting each node get the data row from a shared DataSet (currently I'm passing the row index and going to multiple synchronized arrays to get the data). Hybrid: an immutable data set is accessible by all of the expression trees and each expression tree is evaluated by passing in a data row. The benefit of the first approach is that the data row is being passed into the expression tree and there is no further query done on the data set (which should increase performance in a multithreaded environment). The drawback is that the expression tree does not have access to the rest of the data (in case some of the functions need to do calculations using previous data rows). The benefit of the second approach is that the expression trees can access any data up to the latest data row, but unless I specify what that row is, I'll have to iterate through the rows and figure out which one is the last one. The benefit of the hybrid is that it should generally perform better and still provide access to the earlier data. It supports two basic "views" of data: the latest row and the previous rows. Do you guys know of any design patterns or do you have any tips that can help me build this type of system? Should I use a DataSet to hold and present the data, or are there more efficient ways to present rows of data while maintaining a simple interface? FYI: All of my code is written in C#.

Read the article
Remove undesired indexed keywords from Sql Server FTS Index

- by Scott

Could anyone tell me if SQL Server 2008 has a way to prevent keywords from being indexed that aren't really relevant to the types of searches that will be performed? For example, we have the IFilters for PDF and Word hooked in and our documents are being indexed properly as far as I can tell. These documents, however, have lots of numeric values in them that people won't really be searching for or bring back meaningful results. These are still being indexed and creating lots of entries in the full text catalog. Basically we are trying to optimize our search engine in any way we can and assumed all these unnecessary entries couldn't be helping performance. I want my catalog to consist of alphabetic keywords only. The current iFilters work better than I would be able to write in the time I have but it just has more than I need. This is an example of some of the terms from sys.dm_fts_index_keywords_by_document that I want out: $1,000, $100, $250, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 129, 13.1, 14, 14.12, 145, 15, 16.2, 16.4, 18, 18.1, 18.2, 18.3, 18.4, 18.5 These are some examples from the same management view that I think are desirable for keeping and searching on: above, accordingly, accounts, add, addition, additional, additive Any help would be greatly appreciated!

Read the article
Optimizing MySql query to avoid using "Using filesort"

- by usef_ksa

I need your help to optimize the query to avoid using "Using filesort".The job of the query is to select all the articles that belongs to specific tag. The query is: "select title from tag,article where tag='Riyad' AND tag.article_id=article.id order by tag.article_id". the tables structure are the following: Tag table CREATE TABLE `tag` ( `tag` VARCHAR( 30 ) NOT NULL , `article_id` INT NOT NULL , INDEX ( `tag` ) ) ENGINE = MYISAM ; Article table CREATE TABLE `article` ( `id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY , `title` VARCHAR( 60 ) NOT NULL ) ENGINE = MYISAM Sample data INSERT INTO `article` VALUES (1, 'About Riyad'); INSERT INTO `article` VALUES (2, 'About Newyork'); INSERT INTO `article` VALUES (3, 'About Paris'); INSERT INTO `article` VALUES (4, 'About London'); INSERT INTO `tag` VALUES ('Riyad', 1); INSERT INTO `tag` VALUES ('Saudia', 1); INSERT INTO `tag` VALUES ('Newyork', 2); INSERT INTO `tag` VALUES ('USA', 2); INSERT INTO `tag` VALUES ('Paris', 3); INSERT INTO `tag` VALUES ('France', 3);

Read the article
How to perform spatial partitioning in n-dimensions?

- by kevin42

I'm trying to design an implementation of Vector Quantization as a c++ template class that can handle different types and dimensions of vectors (e.g. 16 dimension vectors of bytes, or 4d vectors of doubles, etc). I've been reading up on the algorithms, and I understand most of it: here and here I want to implement the Linde-Buzo-Gray (LBG) Algorithm, but I'm having difficulty figuring out the general algorithm for partitioning the clusters. I think I need to define a plane (hyperplane?) that splits the vectors in a cluster so there is an equal number on each side of the plane. [edit to add more info] This is an iterative process, but I think I start by finding the centroid of all the vectors, then use that centroid to define the splitting plane, get the centroid of each of the sides of the plane, continuing until I have the number of clusters needed for the VQ algorithm (iterating to optimize for less distortion along the way). The animation in the first link above shows it nicely. My questions are: What is an algorithm to find the plane once I have the centroid? How can I test a vector to see if it is on either side of that plane?

Read the article
Poor execution plans when using a filter and CONTAINSTABLE in a query

- by Paul McLoughlin

We have an interesting problem that I was hoping someone could help to shed some light on. At a high level the problem is as below: The following query executes quickly (1 second): SELECT SA.* FROM cg.SEARCHSERVER_ACTYS AS SA JOIN CONTAINSTABLE(CG.SEARCHSERVER_ACTYS, NOTE, 'reports') AS T1 ON T1.[Key]=SA.UNIQUE_ID but if we add a filter to the query, then it takes approximately 2 minutes to return: SELECT SA.* FROM cg.SEARCHSERVER_ACTYS AS SA JOIN CONTAINSTABLE(CG.SEARCHSERVER_ACTYS, NOTE, 'reports') AS T1 ON T1.[Key]=SA.UNIQUE_ID WHERE SA.CHG_DATE'19 Feb 2010' Looking at the execution plan for the two queries, I can see that in the second case there are two places where there are huge differences between the actual and estimated number of rows, these being: 1) For the FulltextMatch table valued function where the estimate is approx 22,000 rows and the actual is 29 million rows (which are then filtered down to 1670 rows before the join) and 2) For the index seek on the full text index, where the estimate is 1 row and the actual is 13,000 rows As a result of the estimates, the optimiser is choosing to use a nested loops join (since it assumes a small number of rows) hence the plan is inefficient. We can work around the problem by either (a) parameterising the query and adding an OPTION (OPTIMIZE FOR UNKNOWN) to the query or (b) by forcing a HASH JOIN to be used. In both of these cases the query returns in sub 1 second and the estimates appear reasonable. My question really is 'why are the estimates being used in the poorly performing case so wildly inaccurate and what can be done to improve them'? Statistics are up to date on the indexes on the indexed view being used here. Any help greatly appreciated.

Read the article
Best Practice - Removing item from generic collection in C#

- by Matt Davis

I'm using C# in Visual Studio 2008 with .NET 3.5. I have a generic dictionary that maps types of events to a generic list of subscribers. A subscriber can be subscribed to more than one event. private static Dictionary<EventType, List<ISubscriber>> _subscriptions; To remove a subscriber from the subscription list, I can use either of these two options. Option 1: ISubscriber subscriber; // defined elsewhere foreach (EventType event in _subscriptions.Keys) { if (_subscriptions[event].Contains(subscriber)) { _subscriptions[event].Remove(subscriber); } } Option 2: ISubscriber subscriber; // defined elsewhere foreach (EventType event in _subscriptions.Keys) { _subscriptions[event].Remove(subscriber); } I have two questions. First, notice that Option 1 checks for existence before removing the item, while Option 2 uses a brute force removal since Remove() does not throw an exception. Of these two, which is the preferred, "best-practice" way to do this? Second, is there another, "cleaner," more elegant way to do this, perhaps with a lambda expression or using a LINQ extension? I'm still getting acclimated to these two features. Thanks. EDIT Just to clarify, I realize that the choice between Options 1 and 2 is a choice of speed (Option 2) versus maintainability (Option 1). In this particular case, I'm not necessarily trying to optimize the code, although that is certainly a worthy consideration. What I'm trying to understand is if there is a generally well-established practice for doing this. If not, which option would you use in your own code?

Read the article

< Previous Page | 49 50 51 52 53 54 55 56 57 58 59 60 | Next Page >