Search Results

Search found 13300 results on 532 pages for 'exalytics performance tuning'.

Page 132/532 | < Previous Page | 128 129 130 131 132 133 134 135 136 137 138 139  | Next Page >

  • How to handle very frequent updates to a Lucene index

    - by fsm
    I am trying to prototype an indexing/search application which uses very volatile indexing data sources (forums, social networks etc), here are some of the performance requirements, Very fast turn-around time (by this I mean that any new data (such as a new message on a forum) should be available in the search results very soon (less than a minute)) I need to discard old documents on a fairly regular basis to ensure that the search results are not dated. Last but not least, the search application needs to be responsive. (latency on the order of 100 milliseconds, and should support at least 10 qps) All of the requirements I have currently can be met w/o using Lucene (and that would let me satisfy all 1,2 and 3), but I am anticipating other requirements in the future (like search relevance etc) which Lucene makes easier to implement. However, since Lucene is designed for use cases far more complex than the one I'm currently working on, I'm having a hard time satisfying my performance requirements. Here are some questions, a. I read that the optimize() method in the IndexWriter class is expensive, and should not be used by applications that do frequent updates, what are the alternatives? b. In order to do incremental updates, I need to keep committing new data, and also keep refreshing the index reader to make sure it has the new data available. These are going to affect 1 and 3 above. Should I try duplicate indices? What are some common approaches to solving this problem? c. I know that Lucene provides a delete method, which lets you delete all documents that match a certain query, in my case, I need to delete all documents which are older than a certain age, now one option is to add a date field to every document and use that to delete documents later. Is it possible to do range queries on document ids (I can create my own id field since I think that the one created by lucene keeps changing) to delete documents? Is it any faster than comparing dates represented as strings? I know these are very open questions, so I am not looking for a detailed answer, I will try to treat all of your answers as suggestions and use them to inform my design. Thanks! Please let me know if you need any other information.

    Read the article

  • Is jdbc or ldap faster for basic read operations?

    - by Brandon
    I have a set of user data which I am try to access. Due to the way our company's employee data is set up, the information is available both through LDAP and through a table in our DB. I was curious, for standard read operations which would generally be a higher performance query?

    Read the article

  • Resources for Prformance testing

    - by munna
    Our small concern is entrusted with creating an Application on ASP.NET with client-server model. As we are almost done with the development we are creating a small team for Performance test. I have googled in the net about the topic but without much help. If anyone of you can share 'How, What and Why' about perf test, it would be great help.

    Read the article

  • microsoft access query speed...

    - by V.S.
    Hello everyone! I am now writing a report about MS Access and I can't find any information about its performance speed in comparison to other alternatives such as Micorsoft SQL Server, MySQL, Oracle, etc... It's obvious that MS Access is going to be the slowest among the rest, but there is no solid documents confirming this other than forums threads, and I don't have the time and resources to do the research myself :( Hoping for your help, V.S.

    Read the article

  • C# - Fast and simple multi dimensional data structures?

    - by Jeremy Rudd
    I need to store multi-dimensional data consisting of numbers in a manner thats easy to work with. I'm capturing data in real time, and once processed I would destroy and GC older data. This data structure must be fast so it won't hit my overall app performance. The faster the better. What are my choices in terms of platform supported data structures? I'm using VS 2010. and .NET 4.

    Read the article

  • Is mono fast enough for Mac OS X?

    - by prosseek
    I have to use .NET/C# for the next company project. As I've developed my project on Mac, I looked into the mono for development environment/tool. Is the mono for Mac OS X is fast enough? I mean, what about the performance in running the assembly compared to running the same code on .NET under windows machine? Do I have to buy PC laptop for developing C#/.NET in practical sense?

    Read the article

  • Mod_rewrite on all website images

    - by Esteve Camps
    I'm designing an image repository. I want to uncouple the filename from the image html link. For instance: image in filesystem is called images/items/12543.jpg HTML is <img src="images/car.jpg" /> Does anyone strongly discourages me to rewrite all image requests using PHP so when retrieving images/car.jpg, Apache really replies content from images/items/12543.jpg? I don't know if I may get performance problems.

    Read the article

  • What's the fastest lookup algorithm for a key, pair data structure (i.e, a map)?

    - by truncheon
    In the following example a std::map structure is filled with 26 values from A - Z (for key) and 0 – 26 for value. The time taken (on my system) to lookup the last entry (10000000 times) is roughly 250 ms for the vector, and 125 ms for the map. (I compiled using release mode, with O3 option turned on for g++ 4.4) But if for some odd reason I wanted better performance than the std::map, what data structures and functions would I need to consider using? I apologize if the answer seems obvious to you, but I haven't had much experience in the performance critical aspects of C++ programming. #include <ctime> #include <map> #include <vector> #include <iostream> struct mystruct { char key; int value; mystruct(char k = 0, int v = 0) : key(k), value(v) { } }; int find(const std::vector<mystruct>& ref, char key) { for (std::vector<mystruct>::const_iterator i = ref.begin(); i != ref.end(); ++i) if (i->key == key) return i->value; return -1; } int main() { std::map<char, int> mymap; std::vector<mystruct> myvec; for (int i = 'a'; i < 'a' + 26; ++i) { mymap[i] = i - 'a'; myvec.push_back(mystruct(i, i - 'a')); } int pre = clock(); for (int i = 0; i < 10000000; ++i) { find(myvec, 'z'); } std::cout << "linear scan: milli " << clock() - pre << "\n"; pre = clock(); for (int i = 0; i < 10000000; ++i) { mymap['z']; } std::cout << "map scan: milli " << clock() - pre << "\n"; return 0; }

    Read the article

  • Maximum capabilities of MySQL

    - by cdated
    How do I know when a project is just to big for MySQL and I should use something with a better reputation for scalability? Is there a max database size for MySQL before degradation of performance occurs? What factors contribute to MySQL not being a viable option compared to a commercial DBMS like Oracle or SQL Server?

    Read the article

  • What does "performant" software actually mean?

    - by Roddy
    I see it used a lot, but haven't seen a definition that makes complete sense. Wiktionary says "characterized by an adequate or excellent level of performance or efficiency", which isn't much help. Initially I though performant just meant "fast", but others seem to think it's also about stability, code quality, memory use/footprint, or some combination of all those. I think this is a "real" question - but if enough people reckon this is a subjective question, that's an answer in itself.

    Read the article

  • What would be better, (1 database + 4 tables) or (2 databases + 2 tables each) ?

    - by griseldas
    Hi there, I would like to be advised on what would be better (in regards to performance) A) 1 DATABASE with 4 tables or B) 2 DATABASES (same server), each with 2 tables. The tables size and usage are more or less similar, so the 2 tables on Database 1 would be similar usage/size to the 2 tables on database 2 The tables could have +500,000 records and the 2 tables on each database are not related (no join queries etc between them) Thanks in advance for your comments

    Read the article

  • Regex vs. string:find() for simple word boundary

    - by user576267
    Say I only need to find out whether a line read from a file contains a word from a finite set of words. One way of doing this is to use a regex like this: .*\y(good|better|best)\y.* Another way of accomplishing this is using a pseudo code like this: if ( (readLine.find("good") != string::npos) || (readLine.find("better") != string::npos) || (readLine.find("best") != string::npos) ) { // line contains a word from a finite set of words. } Which way will have better performance? (i.e. speed and CPU utilization)

    Read the article

  • Improving performance for WRITE operation on Oracle DB in Java

    - by Lucky
    I've a typical scenario & need to understand best possible way to handle this, so here it goes - I'm developing a solution that will retrieve data from a remote SOAP based web service & will then push this data to an Oracle database on network. Also, this will be a scheduled task that will execute every 15 minutes. I've event queues on remote service that contains the INSERT/UPDATE/DELETE operations that have been done since last retrieval, & once I retrieve the events for last 15 minutes, it again add events for next retrieval. Now, its just pushing data to Oracle so all my interactions are INSERT & UPDATE statements. There are around 60 tables on Oracle with some of them having 100+ columns. Moreover, for every 15 minutes cycle there would be around 60-70 Inserts, 100+ Updates & 10-20 Deletes. This will be an executable jar file that will terminate after operation & will again start on next 15 minutes cycle. So, I need to understand how should I handle WRITE operations (best practices) to improve performance for this application as whole ? Current Test Code (on every cycle) - Connects to remote service to get events. Creates a connection with DB (single connection object). Identifies the type of operation (INSERT/UPDATE/DELETE) & table on which it is done. After above, calls the respective method based on type of operation & table. Uses Preparedstatement with positional parameters, & retrieves each column value from remote service & assigns that to statement parameters. Commits the statement & returns to get event class to process next event. Above is repeated till all the retrieved events are processed after which program closes & then starts on next cycle & everything repeats again. Thanks for help !

    Read the article

  • How much slower is a try/catch block? [closed]

    - by Euclid
    Possible Duplicate: What is the real overhead of try/catch in C#? how much slower is a try catch block than a conditional? eg try { v = someArray[10]; } catch { v = defaultValue; } or if (null != someArray) { v = someArray[10]; } else { v = defaultValue; } is there much in it or isn't there a definative performance differance?

    Read the article

  • Improving the performance of XSL

    - by Rachel
    In the below XSL for the variable "insert-data", I have an input param with the structure, <insert-data> <data compareIndex="4" nodeName="d1e1"> <a/> </data> <data compareIndex="5" nodeName="d1e1"> <b/> </data> <data compareIndex="7" nodeName="d1e2"> <a/> </data> <data compareIndex="9" nodeName="d1e2"> <b/> </data> </insert-data> where "nodeName" is the id of a node and "compareIndex" is the position of the text content relative to the node having id "$nodeName". I am using the below XSL to select all the text nodes(generate-id) that satisfy the above condition and construct a data xml. The below implementation works perfectly but the time taken for the execution is in min. Is there a better way of implementing or is there any in-efficient operation being used. From my observation the code where the preceding text length is calculated consumes the major time. Please share your thoughts to improve the performance of the XSL. I am using Java SAXON XSL transformer. <xsl:variable name="insert-data" as="element()*"> <xsl:for-each select="$insert-file/insert-data/data"> <xsl:sort select="xsd:integer(@index)"/> <xsl:variable name="compareIndex" select="xsd:integer(@compareIndex)" /> <xsl:variable name="nodeName" select="@nodeName" /> <xsl:variable name="nodeContent" as="node()"> <xsl:copy-of select="node()"/> </xsl:variable> <xsl:for-each select="$main-root/*//text()[ancestor::*[@id = $nodeName]]"> <xsl:variable name="preTextLength" as="xsd:integer" select="sum((preceding::text())[. ancestor::*[@id = $nodeName]]/string-length(.))" /> <xsl:variable name="currentTextLength" as="xsd:integer" select="string-length(.)" /> <xsl:variable name="sum" select="$preTextLength + $currentTextLength" as="xsd:integer"></xsl:variable> <xsl:variable name="split-index" select="$compareIndex - $preTextLength" as="xsd:integer"></xsl:variable> <xsl:if test="($sum ge $compareIndex) and ($compareIndex gt $preTextLength)"> <data split-index="{$split-index}" text-id="{generate-id(.)}"> <xsl:copy-of select="$nodeContent"/> </data> </xsl:if> </xsl:for-each> </xsl:for-each> </xsl:variable>

    Read the article

  • HttpClient multithread performance

    - by pepper
    I have an application which downloads more than 4500 html pages from 62 target hosts using HttpClient (4.1.3 or 4.2-beta). It runs on Windows 7 64-bit. Processor - Core i7 2600K. Network bandwidth - 54 Mb/s. At this moment it uses such parameters: DefaultHttpClient and PoolingClientConnectionManager; Also it hasIdleConnectionMonitorThread from http://hc.apache.org/httpcomponents-client-ga/tutorial/html/connmgmt.html; Maximum total connections = 80; Default maximum connections per route = 5; For thread management it uses ForkJoinPool with the parallelism level = 5 (Do I understand correctly that it is a number of working threads?) In this case my network usage (in Windows task manager) does not rise above 2.5%. To download 4500 pages it takes 70 minutes. And in HttpClient logs I have such things: DEBUG ForkJoinPool-2-worker-1 [org.apache.http.impl.conn.PoolingClientConnectionManager]: Connection released: [id: 209][route: {}-http://stackoverflow.com][total kept alive: 6; route allocated: 1 of 5; total allocated: 10 of 80] Total allocated connections do not raise above 10-12, in spite of that I've set it up to 80 connections. If I'll try to rise parallelism level to 20 or 80, network usage remains the same but a lot connection time-outs will be generated. I've read tutorials on hc.apache.org (HttpClient Performance Optimization Guide and HttpClient Threading Guide) but they does not help. Task's code looks like this: public class ContentDownloader extends RecursiveAction { private final HttpClient httpClient; private final HttpContext context; private List<Entry> entries; public ContentDownloader(HttpClient httpClient, List<Entry> entries){ this.httpClient = httpClient; context = new BasicHttpContext(); this.entries = entries; } private void computeDirectly(Entry entry){ final HttpGet get = new HttpGet(entry.getLink()); try { HttpResponse response = httpClient.execute(get, context); int statusCode = response.getStatusLine().getStatusCode(); if ( (statusCode >= 400) && (statusCode <= 600) ) { logger.error("Couldn't get content from " + get.getURI().toString() + "\n" + response.toString()); } else { HttpEntity entity = response.getEntity(); if (entity != null) { String htmlContent = EntityUtils.toString(entity).trim(); entry.setHtml(htmlContent); EntityUtils.consumeQuietly(entity); } } } catch (Exception e) { } finally { get.releaseConnection(); } } @Override protected void compute() { if (entries.size() <= 1){ computeDirectly(entries.get(0)); return; } int split = entries.size() / 2; invokeAll(new ContentDownloader(httpClient, entries.subList(0, split)), new ContentDownloader(httpClient, entries.subList(split, entries.size()))); } } And the question is - what is the best practice to use multi threaded HttpClient, may be there is a some rules for setting up ConnectionManager and HttpClient? How can I use all of 80 connections and raise network usage? If necessary, I will provide more code.

    Read the article

  • ListView slow performance

    - by Mohamed Hemdan
    I've created a list of recipes using Listview/customcursoradapter. A custom layout includes a photo for the recipe , Now I've some problems with the performance of viewing and scrolling the Listview although it has only 10 records (Target is 150). sometimes i get this error java.lang.OutOfMemoryError: bitmap size exceeds VM budget , I've tried to implement the Async task but i failed to do it. Is there any way i can overcome this problem? Your help is highly appreciated !! Here is my GetView method public View getView(int position, View convertView, ViewGroup parent) { View row = super.getView(position, convertView, parent); Cursor cursbbn = getCursor(); if (row == null) { LayoutInflater inflater = (LayoutInflater) localContext.getSystemService(Context.LAYOUT_INFLATER_SERVICE); row = inflater.inflate(R.layout.listtype, null); } String Title = cursbbn.getString(2); String SandID=cursbbn.getString(1); String Readyin = cursbbn.getString(4); String Faovoites=cursbbn.getString(8); TextView titler=(TextView)row.findViewById(R.id.listmaintitle); TextView readyinr=(TextView)row.findViewById(R.id.listreadyin); int colorPos = position % colors.length; row.setBackgroundColor(colors[colorPos]); titler.setText(Title); readyinr.setText(Readyin); ImageView picture = (ImageView) row.findViewById(R.id.imageView1); Bitmap bitImg1 = BitmapFactory.decodeResource(localContext.getResources(), R.drawable.rec0001); Bitmap bitImg2 = BitmapFactory.decodeResource(localContext.getResources(), R.drawable.rec0002); Bitmap bitImg3 = BitmapFactory.decodeResource(localContext.getResources(), R.drawable.rec0003); Bitmap bitImg4 = BitmapFactory.decodeResource(localContext.getResources(), R.drawable.rec0004); Bitmap bitImg5 = BitmapFactory.decodeResource(localContext.getResources(), R.drawable.rec0005); Bitmap bitImg6 = BitmapFactory.decodeResource(localContext.getResources(), R.drawable.rec0006); Bitmap bitImg7 = BitmapFactory.decodeResource(localContext.getResources(), R.drawable.rec0007); Bitmap bitImg8 = BitmapFactory.decodeResource(localContext.getResources(), R.drawable.rec0008); Bitmap bitImg9 = BitmapFactory.decodeResource(localContext.getResources(), R.drawable.rec0009); Bitmap bitImg10 = BitmapFactory.decodeResource(localContext.getResources(), R.drawable.rec0010); if(SandID.contentEquals("0001")) picture.setImageBitmap(getRoundedCornerImage(bitImg1)); if(SandID.contentEquals("0002")) picture.setImageBitmap(getRoundedCornerImage(bitImg2)); if(SandID.contentEquals("0003")) picture.setImageBitmap(getRoundedCornerImage(bitImg3)); if(SandID.contentEquals("0004")) picture.setImageBitmap(getRoundedCornerImage(bitImg4)); if(SandID.contentEquals("0005")) picture.setImageBitmap(getRoundedCornerImage(bitImg5)); if(SandID.contentEquals("0006")) picture.setImageBitmap(getRoundedCornerImage(bitImg6)); if(SandID.contentEquals("0007")) picture.setImageBitmap(getRoundedCornerImage(bitImg7)); if(SandID.contentEquals("0008")) picture.setImageBitmap(getRoundedCornerImage(bitImg8)); if(SandID.contentEquals("0009")) picture.setImageBitmap(getRoundedCornerImage(bitImg9)); if(SandID.contentEquals("0010")) picture.setImageBitmap(getRoundedCornerImage(bitImg10)); return row; } And This is the error : 05-02 03:11:55.898: E/AndroidRuntime(376): FATAL EXCEPTION: main 05-02 03:11:55.898: E/AndroidRuntime(376): java.lang.OutOfMemoryError: bitmap size exceeds VM budget 05-02 03:11:55.898: E/AndroidRuntime(376): at android.graphics.BitmapFactory.nativeDecodeAsset(Native Method) 05-02 03:11:55.898: E/AndroidRuntime(376): at android.graphics.BitmapFactory.decodeStream(BitmapFactory.java:460) 05-02 03:11:55.898: E/AndroidRuntime(376): at android.graphics.BitmapFactory.decodeResourceStream(BitmapFactory.java:336) 05-02 03:11:55.898: E/AndroidRuntime(376): at android.graphics.BitmapFactory.decodeResource(BitmapFactory.java:359) 05-02 03:11:55.898: E/AndroidRuntime(376): at android.graphics.BitmapFactory.decodeResource(BitmapFactory.java:385) 05-02 03:11:55.898: E/AndroidRuntime(376): at master.chef.mediamaster.AlternateRowCursorAdapter.getView(AlternateRowCursorAdapter.java:83) 05-02 03:11:55.898: E/AndroidRuntime(376): at android.widget.AbsListView.obtainView(AbsListView.java:1409) 05-02 03:11:55.898: E/AndroidRuntime(376): at android.widget.ListView.makeAndAddView(ListView.java:1745) 05-02 03:11:55.898: E/AndroidRuntime(376): at android.widget.ListView.fillUp(ListView.java:700) 05-02 03:11:55.898: E/AndroidRuntime(376): at android.widget.ListView.fillGap(ListView.java:646) 05-02 03:11:55.898: E/AndroidRuntime(376): at android.widget.AbsListView.trackMotionScroll(AbsListView.java:3399) 05-02 03:11:55.898: E/AndroidRuntime(376): at android.widget.AbsListView.onTouchEvent(AbsListView.java:2233) 05-02 03:11:55.898: E/AndroidRuntime(376): at android.widget.ListView.onTouchEvent(ListView.java:3446) 05-02 03:11:55.898: E/AndroidRuntime(376): at android.view.View.dispatchTouchEvent(View.java:3885) 05-02 03:11:55.898: E/AndroidRuntime(376): at android.view.ViewGroup.dispatchTouchEvent(ViewGroup.java:903) 05-02 03:11:55.898: E/AndroidRuntime(376): at android.view.ViewGroup.dispatchTouchEvent(ViewGroup.java:942) 05-02 03:11:55.898: E/AndroidRuntime(376): at android.view.ViewGroup.dispatchTouchEvent(ViewGroup.java:942) 05-02 03:11:55.898: E/AndroidRuntime(376): at android.view.ViewGroup.dispatchTouchEvent(ViewGroup.java:942) 05-02 03:11:55.898: E/AndroidRuntime(376): at android.view.ViewGroup.dispatchTouchEvent(ViewGroup.java:942) 05-02 03:11:55.898: E/AndroidRuntime(376): at android.view.ViewGroup.dispatchTouchEvent(ViewGroup.java:942) 05-02 03:11:55.898: E/AndroidRuntime(376): at android.view.ViewGroup.dispatchTouchEvent(ViewGroup.java:942) 05-02 03:11:55.898: E/AndroidRuntime(376): at android.view.ViewGroup.dispatchTouchEvent(ViewGroup.java:942) 05-02 03:11:55.898: E/AndroidRuntime(376): at android.view.ViewGroup.dispatchTouchEvent(ViewGroup.java:942) 05-02 03:11:55.898: E/AndroidRuntime(376): at com.android.internal.policy.impl.PhoneWindow$DecorView.superDispatchTouchEvent(PhoneWindow.java:1691) 05-02 03:11:55.898: E/AndroidRuntime(376): at com.android.internal.policy.impl.PhoneWindow.superDispatchTouchEvent(PhoneWindow.java:1125) 05-02 03:11:55.898: E/AndroidRuntime(376): at android.app.Activity.dispatchTouchEvent(Activity.java:2096) 05-02 03:11:55.898: E/AndroidRuntime(376): at com.android.internal.policy.impl.PhoneWindow$DecorView.dispatchTouchEvent(PhoneWindow.java:1675) 05-02 03:11:55.898: E/AndroidRuntime(376): at android.view.ViewRoot.deliverPointerEvent(ViewRoot.java:2194) 05-02 03:11:55.898: E/AndroidRuntime(376): at android.view.ViewRoot.handleMessage(ViewRoot.java:1878) 05-02 03:11:55.898: E/AndroidRuntime(376): at android.os.Handler.dispatchMessage(Handler.java:99) 05-02 03:11:55.898: E/AndroidRuntime(376): at android.os.Looper.loop(Looper.java:123) 05-02 03:11:55.898: E/AndroidRuntime(376): at android.app.ActivityThread.main(ActivityThread.java:3683) 05-02 03:11:55.898: E/AndroidRuntime(376): at java.lang.reflect.Method.invokeNative(Native Method) 05-02 03:11:55.898: E/AndroidRuntime(376): at java.lang.reflect.Method.invoke(Method.java:507) 05-02 03:11:55.898: E/AndroidRuntime(376): at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:839) 05-02 03:11:55.898: E/AndroidRuntime(376): at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:597) 05-02 03:11:55.898: E/AndroidRuntime(376): at dalvik.system.NativeStart.main(Native Method)

    Read the article

  • Optimizing python code performance when importing zipped csv to a mongo collection

    - by mark
    I need to import a zipped csv into a mongo collection, but there is a catch - every record contains a timestamp in Pacific Time, which must be converted to the local time corresponding to the (longitude,latitude) pair found in the same record. The code looks like so: def read_csv_zip(path, timezones): with ZipFile(path) as z, z.open(z.namelist()[0]) as input: csv_rows = csv.reader(input) header = csv_rows.next() check,converters = get_aux_stuff(header) for csv_row in csv_rows: if check(csv_row): row = { converter[0]:converter[1](value) for converter, value in zip(converters, csv_row) if allow_field(converter) } ts = row['ts'] lng, lat = row['loc'] found_tz_entry = timezones.find_one(SON({'loc': {'$within': {'$box': [[lng-tz_lookup_radius, lat-tz_lookup_radius],[lng+tz_lookup_radius, lat+tz_lookup_radius]]}}})) if found_tz_entry: tz_name = found_tz_entry['tz'] local_ts = ts.astimezone(timezone(tz_name)).replace(tzinfo=None) row['tz'] = tz_name else: local_ts = (ts.astimezone(utc) + timedelta(hours = int(lng/15))).replace(tzinfo = None) row['local_ts'] = local_ts yield row def insert_documents(collection, source, batch_size): while True: items = list(itertools.islice(source, batch_size)) if len(items) == 0: break; try: collection.insert(items) except: for item in items: try: collection.insert(item) except Exception as exc: print("Failed to insert record {0} - {1}".format(item['_id'], exc)) def main(zip_path): with Connection() as connection: data = connection.mydb.data timezones = connection.timezones.data insert_documents(data, read_csv_zip(zip_path, timezones), 1000) The code proceeds as follows: Every record read from the csv is checked and converted to a dictionary, where some fields may be skipped, some titles be renamed (from those appearing in the csv header), some values may be converted (to datetime, to integers, to floats. etc ...) For each record read from the csv, a lookup is made into the timezones collection to map the record location to the respective time zone. If the mapping is successful - that timezone is used to convert the record timestamp (pacific time) to the respective local timestamp. If no mapping is found - a rough approximation is calculated. The timezones collection is appropriately indexed, of course - calling explain() confirms it. The process is slow. Naturally, having to query the timezones collection for every record kills the performance. I am looking for advises on how to improve it. Thanks. EDIT The timezones collection contains 8176040 records, each containing four values: > db.data.findOne() { "_id" : 3038814, "loc" : [ 1.48333, 42.5 ], "tz" : "Europe/Andorra" } EDIT2 OK, I have compiled a release build of http://toblerity.github.com/rtree/ and configured the rtree package. Then I have created an rtree dat/idx pair of files corresponding to my timezones collection. So, instead of calling collection.find_one I call index.intersection. Surprisingly, not only there is no improvement, but it works even more slowly now! May be rtree could be fine tuned to load the entire dat/idx pair into RAM (704M), but I do not know how to do it. Until then, it is not an alternative. In general, I think the solution should involve parallelization of the task. EDIT3 Profile output when using collection.find_one: >>> p.sort_stats('cumulative').print_stats(10) Tue Apr 10 14:28:39 2012 ImportDataIntoMongo.profile 64549590 function calls (64549180 primitive calls) in 1231.257 seconds Ordered by: cumulative time List reduced from 730 to 10 due to restriction <10> ncalls tottime percall cumtime percall filename:lineno(function) 1 0.012 0.012 1231.257 1231.257 ImportDataIntoMongo.py:1(<module>) 1 0.001 0.001 1230.959 1230.959 ImportDataIntoMongo.py:187(main) 1 853.558 853.558 853.558 853.558 {raw_input} 1 0.598 0.598 370.510 370.510 ImportDataIntoMongo.py:165(insert_documents) 343407 9.965 0.000 359.034 0.001 ImportDataIntoMongo.py:137(read_csv_zip) 343408 2.927 0.000 287.035 0.001 c:\python27\lib\site-packages\pymongo\collection.py:489(find_one) 343408 1.842 0.000 274.803 0.001 c:\python27\lib\site-packages\pymongo\cursor.py:699(next) 343408 2.542 0.000 271.212 0.001 c:\python27\lib\site-packages\pymongo\cursor.py:644(_refresh) 343408 4.512 0.000 253.673 0.001 c:\python27\lib\site-packages\pymongo\cursor.py:605(__send_message) 343408 0.971 0.000 242.078 0.001 c:\python27\lib\site-packages\pymongo\connection.py:871(_send_message_with_response) Profile output when using index.intersection: >>> p.sort_stats('cumulative').print_stats(10) Wed Apr 11 16:21:31 2012 ImportDataIntoMongo.profile 41542960 function calls (41542536 primitive calls) in 2889.164 seconds Ordered by: cumulative time List reduced from 778 to 10 due to restriction <10> ncalls tottime percall cumtime percall filename:lineno(function) 1 0.028 0.028 2889.164 2889.164 ImportDataIntoMongo.py:1(<module>) 1 0.017 0.017 2888.679 2888.679 ImportDataIntoMongo.py:202(main) 1 2365.526 2365.526 2365.526 2365.526 {raw_input} 1 0.766 0.766 502.817 502.817 ImportDataIntoMongo.py:180(insert_documents) 343407 9.147 0.000 491.433 0.001 ImportDataIntoMongo.py:152(read_csv_zip) 343406 0.571 0.000 391.394 0.001 c:\python27\lib\site-packages\rtree-0.7.0-py2.7.egg\rtree\index.py:384(intersection) 343406 379.957 0.001 390.824 0.001 c:\python27\lib\site-packages\rtree-0.7.0-py2.7.egg\rtree\index.py:435(_intersection_obj) 686513 22.616 0.000 38.705 0.000 c:\python27\lib\site-packages\rtree-0.7.0-py2.7.egg\rtree\index.py:451(_get_objects) 343406 6.134 0.000 33.326 0.000 ImportDataIntoMongo.py:162(<dictcomp>) 346 0.396 0.001 30.665 0.089 c:\python27\lib\site-packages\pymongo\collection.py:240(insert) EDIT4 I have parallelized the code, but the results are still not very encouraging. I am convinced it could be done better. See my own answer to this question for details.

    Read the article

< Previous Page | 128 129 130 131 132 133 134 135 136 137 138 139  | Next Page >