Search Results

Search found 22040 results on 882 pages for 'process improvement'.

Page 398/882 | < Previous Page | 394 395 396 397 398 399 400 401 402 403 404 405  | Next Page >

  • Architecture for database analytics

    - by David Cournapeau
    Hi, We have an architecture where we provide each customer Business Intelligence-like services for their website (internet merchant). Now, I need to analyze those data internally (for algorithmic improvement, performance tracking, etc...) and those are potentially quite heavy: we have up to millions of rows / customer / day, and I may want to know how many queries we had in the last month, weekly compared, etc... that is the order of billions entries if not more. The way it is currently done is quite standard: daily scripts which scan the databases, and generate big CSV files. I don't like this solutions for several reasons: as typical with those kinds of scripts, they fall into the write-once and never-touched-again category tracking things in "real-time" is necessary (we have separate toolset to query the last few hours ATM). this is slow and non-"agile" Although I have some experience in dealing with huge datasets for scientific usage, I am a complete beginner as far as traditional RDBM go. It seems that using column-oriented database for analytics could be a solution (the analytics don't need most of the data we have in the app database), but I would like to know what other options are available for this kind of issues.

    Read the article

  • MS SQL - Multi-Column substring matching

    - by hamlin11
    One of my clients is hooked on multi-column substring matching. I understand that Contains and FreeText search for words (and at least in the case of Contains, word prefixes). However, based upon my understanding of this MSDN book, neither of these nor their variants are capable of searching substrings. I have used LIKE rather extensively (Select * from A where A.B Like '%substr%') Sample table A: ID | Col1 | Col2 | Col3 | ------------------------------------- 1 | oklahoma | colorado | Utah | 2 | arkansas | colorado | oklahoma | 3 | florida | michigan | florida | ------------------------------------- The following code will give us row 1 and row 2: select * from A where Col1 like '%klah%' or Col2 like '%klah%' or Col3 like '%klah%' This is rather ugly, probably slow, and I just don't like it very much. Probably because the implementations that I'm dealing with have 10+ columns that need searched. The following may be a slight improvement as code readability goes, but as far as performance, we're still in the same ball park. select * from A where (Col1 + ' ' + Col2 + ' ' + Col3) like '%klah%' I have thought about simply adding insert, update, and delete triggers that simply add the concatenated version of the above columns into a separate table that shadows this table. Sample Shadow_Table: ID | searchtext | --------------------------------- 1 | oklahoma colorado Utah | 2 | arkansas colorado oklahoma | 3 | florida michigan florida | --------------------------------- This would allow us to perform the following query to search for '%klah%' select * from Shadow_Table where searchtext like '%klah%' I really don't like having to remember that this shadow table exists and that I'm supposed to use it when I am performing multi-column substring matching, but it probably yields pretty quick reads at the expense of write and storage space. My gut feeling tells me there there is an existing solution built into SQL Server 2008. However, I don't seem to be able to find anything other than research papers on the subject. Any help would be appreciated.

    Read the article

  • IE7 Problem with sIFR when <br> is inside an H3

    - by David Fox
    I have a problem I just discovered when viewing certain pages in IE7. If I have a very long header that wraps to a second line, or worse, if I put a BR in the middle, that throws off the spacing. One page to look at: broken example1 You'll notice that the margin at the top of the page gets offset as the headings are rendered, throwing everything off. I'm using code like this: <h3 style="margin:0"><a href="../books/msc1.html">Middle School Confidential™<br> Book 1: Be Confident in Who You Are</a></h3> but repeated many times to exaggerate the problem. I tried another test where I removed the BR and let the lines wrap naturally. This is an improvement in terms of the spacing, but it doesn't fix the problem. (Same URL but make it m1.html) In the third example, each heading takes up only one line (m2.html) One option would be to just split up the heading onto two lines, each with its on H tags. But since these are links, then it will appear that the first line might go to one place, and the second to another, since they wouldn't change color simultaneously as you roll over them. So, any solutions to this? I believe I have the current version of sIFR 3. I don't want to upgrade to IE8 until I know this is resolved. Thanks!

    Read the article

  • Queries within queries: Is there a better way?

    - by mririgo
    As I build bigger, more advanced web applications, I'm finding myself writing extremely long and complex queries. I tend to write queries within queries a lot because I feel making one call to the database from PHP is better than making several and correlating the data. However, anyone who knows anything about SQL knows about JOINs. Personally, I've used a JOIN or two before, but quickly stopped when I discovered using subqueries because it felt easier and quicker for me to write and maintain. Commonly, I'll do subqueries that may contain one or more subqueries from relative tables. Consider this example: SELECT (SELECT username FROM users WHERE records.user_id = user_id) AS username, (SELECT last_name||', '||first_name FROM users WHERE records.user_id = user_id) AS name, in_timestamp, out_timestamp FROM records ORDER BY in_timestamp Rarely, I'll do subqueries after the WHERE clause. Consider this example: SELECT user_id, (SELECT name FROM organizations WHERE (SELECT organization FROM locations WHERE records.location = location_id) = organization_id) AS organization_name FROM records ORDER BY in_timestamp In these two cases, would I see any sort of improvement if I decided to rewrite the queries using a JOIN? As more of a blanket question, what are the advantages/disadvantages of using subqueries or a JOIN? Is one way more correct or accepted than the other?

    Read the article

  • vs2008, opencv2.1,compile error in cxcore.hpp

    - by Long Gu
    Hi, gurus: I installed opencv2.1, made a new project (proj_A) using vs2008, used it for my computer vision tasks, it works fine. I copied an old project (proj_B, also made using vs2008) from other PC, compile it with ".h" and ".lib" files copied from opencv1.0 (which I did not install onto my PC), it compiles fine. I re-directed ".h" and ".lib" files in proj_B to opencv2.1 folders instead, compiled the proj_B, and then I got these errors from cxcore.hpp: class CV_EXPORTS RNG { public: enum { A=4164903690U, UNIFORM=0, NORMAL=1 }; // errors here, line 936 errors are: 5c:\opencv2.1\include\opencv\cxcore.hpp(936) : error C2143: syntax error : missing '}' before 'constant' 5c:\opencv2.1\include\opencv\cxcore.hpp(936) : error C2059: syntax error : 'constant' 5c:\opencv2.1\include\opencv\cxcore.hpp(936) : error C2143: syntax error : missing ';' before '}' 5c:\opencv2.1\include\opencv\cxcore.hpp(936) : error C2238: unexpected token(s) preceding ';' (400+ similar errors, but I believe the answer should be the same, so only list 1 set here) I compared setting for proj_A and proj_B, made them identical, and find no improvement. proj_A works well, proj_B refuse to compile. May I know what's wrong? Urgent, need to get it solved ASAP! Thanks a lot!

    Read the article

  • Rewriting jQuery to plain old javascript - are the performance gains worth it?

    - by Swader
    Since jQuery is an incredibly easy and banal library, I've developed a rather complex project fairly quickly with it. The entire interface is jQuery based, and memory is cleaned regularly to maintain optimum performance. Everything works very well in Firefox, and exceptionally so in Chrome (other browsers are of no concern for me as this is not a commercial or publicly available product). What I'm wondering now is - since pure plain old javascript is really not a complicated language to master, would it be performance enhancing to rewrite the whole thing in plain old JS, and if so, how much of a boost would you expect to get from it? If the answers prove positive enough, I'll go ahead and do it, run a benchmark and report back with the precise findings. Cheers Edit: Thanks guys, valuable insight. The purpose was not to "re-invent the wheel" - it was just for experience and personal improvement. Just because something exists, doesn't mean you shouldn't explore it into greater detail, know how it works or try to recreate it. This is the same reason I seldom use frameworks, I would much rather use my own code and iron it out and gain massive experience doing it, than start off by using someone else's code, regardless of how ironed out it is. Anyway, won't be doing it, thanks for saving me the effort :)

    Read the article

  • jQuery - Multiple setInterval Conflict

    - by Chris Bowyer
    I am a jQuery novice and each of the following work fine on their own, but get out of time when working together. What am I doing wrong? Any improvement on the code would be appreciated too... It is to be used to rotate advertising. <!--- Header Rotator ---> <script type="text/javascript"> $(document).ready(function() { $("#header").load("header.cfm"); var refreshHeader = setInterval(function() { $("#header").load("header.cfm"); }, 10000); }); </script> <!--- Main Rotator ---> <script type="text/javascript"> $(document).ready(function() { $("#main").load("main.cfm"); var refreshMain = setInterval(function() { $("#main").load("main.cfm"); }, 5000); }); </script> <!--- Footer Rotator ---> <script type="text/javascript"> $(document).ready(function() { $("#footer").load("footer.cfm"); var refreshFooter = setInterval(function() { $("#footer").load("footer.cfm"); }, 2000); }); </script>

    Read the article

  • Give the mount point of a path

    - by Charles Stewart
    The following, very non-robust shell code will give the mount point of $path: (for i in $(df|cut -c 63-99); do case $path in $i*) echo $i;; esac; done) | tail -n 1 Is there a better way to do this? Postscript This script is really awful, but has the redeeming quality that it Works On My Systems. Note that several mount points may be prefixes of $path. Examples On a Linux system: cas@txtproof:~$ path=/sys/block/hda1 cas@txtproof:~$ for i in $(df -a|cut -c 57-99); do case $path in $i*) echo $i;; esac; done| tail -1 /sys On a Mac osx system cas local$ path=/dev/fd/0 cas local$ for i in $(df -a|cut -c 63-99); do case $path in $i*) echo $i;; esac; done| tail -1 /dev Note the need to vary cut's parameters, because of the way df's output differs: indeed, awk is better. Answer It looks like munging tabular output is the only way within the shell, but df /dev/fd/impossible | tail -1 | awk '{ print $NF}' is a big improvement on what I had. Note two differences in semantics: firstly, df $path insists that $path names an existing file, the script I had above doesn't care; secondly, there are no worries about dereferncing symlinks. It's not difficult to write Python code to do the job.

    Read the article

  • Effective communication in a component-based system

    - by Tesserex
    Yes, this is another question about my game engine, which is coming along very nicely, with much thanks to you guys. So, if you watched the video (or didn't), the objects in the game are composed of various components for things like position, sprites, movement, collision, sounds, health, etc. I have several message types defined for "tell" type communication between entities and components, but this only goes so far. There are plenty of times when I just need to ask for something, for example an entity's position. There are dozens of lines in my code that look like this: SomeComponent comp = (SomeComponent)entity.GetComponent(typeof(SomeComponent)); if (comp != null) comp.GetSomething(); I know this is very ugly, and I know that casting smells of improper OO design. But as complex as things are, there doesn't seem to be a better way. I could of course "hard-code" my component types and just have SomeComponent comp = entity.GetSomeComponent(); but that seems like a cop-out, and a bad one. I literally JUST REALIZED, while writing this, after having my code this way for months with no solution, that a generic will help me. SomeComponent comp = entity.GetComponent<SomeComponent>(); Amazing how that works. Anyway, this is still only a semantic improvement. My questions remain. Is this actually that bad? What's a better alternative?

    Read the article

  • How much does an InnoDB table benefit from having fixed-length rows?

    - by Philip Eve
    I know that dependent on the database storage engine in use, a performance benefit can be found if all of the rows in the table can be guaranteed to be the same length (by avoiding nullable columns and not using any VARCHAR, TEXT or BLOB columns). I'm not clear on how far this applies to InnoDB, with its funny table arrangements. Let's give an example: I have the following table CREATE TABLE `PlayerGameRcd` ( `User` SMALLINT UNSIGNED NOT NULL, `Game` MEDIUMINT UNSIGNED NOT NULL, `GameResult` ENUM('Quit', 'Kicked by Vote', 'Kicked by Admin', 'Kicked by System', 'Finished 5th', 'Finished 4th', 'Finished 3rd', 'Finished 2nd', 'Finished 1st', 'Game Aborted', 'Playing', 'Hide' ) NOT NULL DEFAULT 'Playing', `Inherited` TINYINT NOT NULL, `GameCounts` TINYINT NOT NULL, `Colour` TINYINT UNSIGNED NOT NULL, `Score` SMALLINT UNSIGNED NOT NULL DEFAULT 0, `NumLongTurns` TINYINT UNSIGNED NOT NULL DEFAULT 0, `Notes` MEDIUMTEXT, `CurrentOccupant` TINYINT UNSIGNED NOT NULL DEFAULT 0, PRIMARY KEY (`Game`, `User`), UNIQUE KEY `PGR_multi_uk` (`Game`, `CurrentOccupant`, `Colour`), INDEX `Stats_ind_PGR` (`GameCounts`, `GameResult`, `Score`, `User`), INDEX `GameList_ind_PGR` (`User`, `CurrentOccupant`, `Game`, `Colour`), CONSTRAINT `Constr_PlayerGameRcd_User_fk` FOREIGN KEY `User_fk` (`User`) REFERENCES `User` (`UserID`) ON DELETE CASCADE ON UPDATE CASCADE, CONSTRAINT `Constr_PlayerGameRcd_Game_fk` FOREIGN KEY `Game_fk` (`Game`) REFERENCES `Game` (`GameID`) ON DELETE CASCADE ON UPDATE CASCADE ) ENGINE=INNODB CHARACTER SET utf8 COLLATE utf8_general_ci The only column that is nullable is Notes, which is MEDIUMTEXT. This table presently has 33097 rows (which I appreciate is small as yet). Of these rows, only 61 have values in Notes. How much of an improvement might I see from, say, adding a new table to store the Notes column in and performing LEFT JOINs when necessary?

    Read the article

  • std::ifstream buffer caching

    - by ledokol
    Hello everybody, In my application I'm trying to merge sorted files (keeping them sorted of course), so I have to iterate through each element in both files to write the minimal to the third one. This works pretty much slow on big files, as far as I don't see any other choice (the iteration has to be done) I'm trying to optimize file loading. I can use some amount of RAM, which I can use for buffering. I mean instead of reading 4 bytes from both files every time I can read once something like 100Mb and work with that buffer after that, until there will be no element in buffer, then I'll refill the buffer again. But I guess ifstream is already doing that, will it give me more performance and is there any reason? If fstream does, maybe I can change size of that buffer? added My current code looks like that (pseudocode) // this is done in loop int i1 = input1.read_integer(); int i2 = input2.read_integer(); if (!input1.eof() && !input2.eof()) { if (i1 < i2) { output.write(i1); input2.seek_back(sizeof(int)); } else input1.seek_back(sizeof(int)); output.write(i2); } } else { if (input1.eof()) output.write(i2); else if (input2.eof()) output.write(i1); } What I don't like here is seek_back - I have to seek back to previous position as there is no way to peek 4 bytes too much reading from file if one of the streams is in EOF it still continues to check that stream instead of putting contents of another stream directly to output, but this is not a big issue, because chunk sizes are almost always equal. Can you suggest improvement for that? Thanks.

    Read the article

  • MySQL Config File for Large System

    - by Jonathon
    We are running MySQL on a Windows 2003 Server Enterpise Edition box. MySQL is about the only program running on the box. We have approx. 8 slaves replicated to it, but my understanding is that having multiple slaves connecting to the same master does not significantly slow down performance, if at all. The master server has 16G RAM, 10 Terabyte drives in RAID 10, and four dual-core processors. From what I have seen from other sites, we have a really robust machine as our master db server. We just upgraded from a machine with only 4G RAM, but with similar hard drives, RAID, etc. It also ran Apache on it, so it was our db server and our application server. It was getting a little slow, so we split the db server onto this new machine and kept the application server on the first machine. We also distributed the application load amongst a few of our other slave servers, which also run the application. The problem is the new db server has mysqld.exe consuming 95-100% of CPU almost all the time and is really causing the app to run slowly. I know we have several queries and table structures that could be better optimized, but since they worked okay on the older, smaller server, I assume that our my.ini (MySQL config) file is not properly configured. Most of what I see on the net is for setting config files on small machines, so can anyone help me get the my.ini file correct for a large dedicated machine like ours? I just don't see how mysqld could get so bogged down! FYI: We have about 100 queries per second. We only use MyISAM tables, so skip-innodb is set in the ini file. And yes, I know it is reading the ini file correctly because I can change some settings (like the server-id and it will kill the server at startup). Here is the my.ini file: #MySQL Server Instance Configuration File # ---------------------------------------------------------------------- # Generated by the MySQL Server Instance Configuration Wizard # # # Installation Instructions # ---------------------------------------------------------------------- # # On Linux you can copy this file to /etc/my.cnf to set global options, # mysql-data-dir/my.cnf to set server-specific options # (@localstatedir@ for this installation) or to # ~/.my.cnf to set user-specific options. # # On Windows you should keep this file in the installation directory # of your server (e.g. C:\Program Files\MySQL\MySQL Server X.Y). To # make sure the server reads the config file use the startup option # "--defaults-file". # # To run run the server from the command line, execute this in a # command line shell, e.g. # mysqld --defaults-file="C:\Program Files\MySQL\MySQL Server X.Y\my.ini" # # To install the server as a Windows service manually, execute this in a # command line shell, e.g. # mysqld --install MySQLXY --defaults-file="C:\Program Files\MySQL\MySQL Server X.Y\my.ini" # # And then execute this in a command line shell to start the server, e.g. # net start MySQLXY # # # Guildlines for editing this file # ---------------------------------------------------------------------- # # In this file, you can use all long options that the program supports. # If you want to know the options a program supports, start the program # with the "--help" option. # # More detailed information about the individual options can also be # found in the manual. # # # CLIENT SECTION # ---------------------------------------------------------------------- # # The following options will be read by MySQL client applications. # Note that only client applications shipped by MySQL are guaranteed # to read this section. If you want your own MySQL client program to # honor these values, you need to specify it as an option during the # MySQL client library initialization. # [client] port=3306 [mysql] default-character-set=latin1 # SERVER SECTION # ---------------------------------------------------------------------- # # The following options will be read by the MySQL Server. Make sure that # you have installed the server correctly (see above) so it reads this # file. # [mysqld] # The TCP/IP Port the MySQL Server will listen on port=3306 #Path to installation directory. All paths are usually resolved relative to this. basedir="D:/MySQL/" #Path to the database root datadir="D:/MySQL/data" # The default character set that will be used when a new schema or table is # created and no character set is defined default-character-set=latin1 # The default storage engine that will be used when create new tables when default-storage-engine=MYISAM # Set the SQL mode to strict #sql-mode="STRICT_TRANS_TABLES,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION" # we changed this because there are a couple of queries that can get blocked otherwise sql-mode="" #performance configs skip-locking max_allowed_packet = 1M table_open_cache = 512 # The maximum amount of concurrent sessions the MySQL server will # allow. One of these connections will be reserved for a user with # SUPER privileges to allow the administrator to login even if the # connection limit has been reached. max_connections=1510 # Query cache is used to cache SELECT results and later return them # without actual executing the same query once again. Having the query # cache enabled may result in significant speed improvements, if your # have a lot of identical queries and rarely changing tables. See the # "Qcache_lowmem_prunes" status variable to check if the current value # is high enough for your load. # Note: In case your tables change very often or if your queries are # textually different every time, the query cache may result in a # slowdown instead of a performance improvement. query_cache_size=168M # The number of open tables for all threads. Increasing this value # increases the number of file descriptors that mysqld requires. # Therefore you have to make sure to set the amount of open files # allowed to at least 4096 in the variable "open-files-limit" in # section [mysqld_safe] table_cache=3020 # Maximum size for internal (in-memory) temporary tables. If a table # grows larger than this value, it is automatically converted to disk # based table This limitation is for a single table. There can be many # of them. tmp_table_size=30M # How many threads we should keep in a cache for reuse. When a client # disconnects, the client's threads are put in the cache if there aren't # more than thread_cache_size threads from before. This greatly reduces # the amount of thread creations needed if you have a lot of new # connections. (Normally this doesn't give a notable performance # improvement if you have a good thread implementation.) thread_cache_size=64 #*** MyISAM Specific options # The maximum size of the temporary file MySQL is allowed to use while # recreating the index (during REPAIR, ALTER TABLE or LOAD DATA INFILE. # If the file-size would be bigger than this, the index will be created # through the key cache (which is slower). myisam_max_sort_file_size=100G # If the temporary file used for fast index creation would be bigger # than using the key cache by the amount specified here, then prefer the # key cache method. This is mainly used to force long character keys in # large tables to use the slower key cache method to create the index. myisam_sort_buffer_size=64M # Size of the Key Buffer, used to cache index blocks for MyISAM tables. # Do not set it larger than 30% of your available memory, as some memory # is also required by the OS to cache rows. Even if you're not using # MyISAM tables, you should still set it to 8-64M as it will also be # used for internal temporary disk tables. key_buffer_size=3072M # Size of the buffer used for doing full table scans of MyISAM tables. # Allocated per thread, if a full scan is needed. read_buffer_size=2M read_rnd_buffer_size=8M # This buffer is allocated when MySQL needs to rebuild the index in # REPAIR, OPTIMZE, ALTER table statements as well as in LOAD DATA INFILE # into an empty table. It is allocated per thread so be careful with # large settings. sort_buffer_size=2M #*** INNODB Specific options *** innodb_data_home_dir="D:/MySQL InnoDB Datafiles/" # Use this option if you have a MySQL server with InnoDB support enabled # but you do not plan to use it. This will save memory and disk space # and speed up some things. skip-innodb # Additional memory pool that is used by InnoDB to store metadata # information. If InnoDB requires more memory for this purpose it will # start to allocate it from the OS. As this is fast enough on most # recent operating systems, you normally do not need to change this # value. SHOW INNODB STATUS will display the current amount used. innodb_additional_mem_pool_size=11M # If set to 1, InnoDB will flush (fsync) the transaction logs to the # disk at each commit, which offers full ACID behavior. If you are # willing to compromise this safety, and you are running small # transactions, you may set this to 0 or 2 to reduce disk I/O to the # logs. Value 0 means that the log is only written to the log file and # the log file flushed to disk approximately once per second. Value 2 # means the log is written to the log file at each commit, but the log # file is only flushed to disk approximately once per second. innodb_flush_log_at_trx_commit=1 # The size of the buffer InnoDB uses for buffering log data. As soon as # it is full, InnoDB will have to flush it to disk. As it is flushed # once per second anyway, it does not make sense to have it very large # (even with long transactions). innodb_log_buffer_size=6M # InnoDB, unlike MyISAM, uses a buffer pool to cache both indexes and # row data. The bigger you set this the less disk I/O is needed to # access data in tables. On a dedicated database server you may set this # parameter up to 80% of the machine physical memory size. Do not set it # too large, though, because competition of the physical memory may # cause paging in the operating system. Note that on 32bit systems you # might be limited to 2-3.5G of user level memory per process, so do not # set it too high. innodb_buffer_pool_size=500M # Size of each log file in a log group. You should set the combined size # of log files to about 25%-100% of your buffer pool size to avoid # unneeded buffer pool flush activity on log file overwrite. However, # note that a larger logfile size will increase the time needed for the # recovery process. innodb_log_file_size=100M # Number of threads allowed inside the InnoDB kernel. The optimal value # depends highly on the application, hardware as well as the OS # scheduler properties. A too high value may lead to thread thrashing. innodb_thread_concurrency=10 #replication settings (this is the master) log-bin=log server-id = 1 Thanks for all the help. It is greatly appreciated.

    Read the article

  • Indexing table with duplicates MySQL/MSSQL with millions of records

    - by Tesnep
    I need help in indexing in MySQL. I have a table in MySQL with following rows: ID Store_ID Feature_ID Order_ID Viewed_Date Deal_ID IsTrial The ID is auto generated. Store_ID goes from 1 - 8. Feature_ID from 1 - let's say 100. Viewed Date is Date and time on which the data is inserted. IsTrial is either 0 or 1. You can ignore Order_ID and Deal_ID from this discussion. There are millions of data in the table and we have a reporting backend that needs to view the number of views in a certain period or overall where trial is 0 for a particular store id and for a particular feature. The query takes the form of: select count(viewed_date) from theTable where viewed_date between '2009-12-01' and '2010-12-31' and store_id = '2' and feature_id = '12' and Istrial = 0 In MSSQL you can have a filtered index to use for Istrial. Is there anything similar to this in MySQL? Also, Store_ID and Feature_ID have a lot of duplicate data. I created an index using Store_ID and Feature_ID. Although this seems to have decreased the search period, I need better improvement than this. Right now I have more than 4 million rows. To search for a particular query like the one above, it looks at 3.5 million rows in order to give me the count of 500k rows. PS. I forgot to add view_date filter in the query. Now I have done this.

    Read the article

  • How do I efficiently parse a CSV file in Perl?

    - by Mike
    I'm working on a project that involves parsing a large csv formatted file in Perl and am looking to make things more efficient. My approach has been to split() the file by lines first, and then split() each line again by commas to get the fields. But this suboptimal since at least two passes on the data are required. (once to split by lines, then once again for each line). This is a very large file, so cutting processing in half would be a significant improvement to the entire application. My question is, what is the most time efficient means of parsing a large CSV file using only built in tools? note: Each line has a varying number of tokens, so we can't just ignore lines and split by commas only. Also we can assume fields will contain only alphanumeric ascii data (no special characters or other tricks). Also, i don't want to get into parallel processing, although it might work effectively. edit It can only involve built-in tools that ship with Perl 5.8. For bureaucratic reasons, I cannot use any third party modules (even if hosted on cpan) another edit Let's assume that our solution is only allowed to deal with the file data once it is entirely loaded into memory. yet another edit I just grasped how stupid this question is. Sorry for wasting your time. Voting to close.

    Read the article

  • Parallelizing L2S Entity Retrieval

    - by MarkB
    Assuming a typical domain entity approach with SQL Server and a dbml/L2S DAL with a logic layer on top of that: In situations where lazy loading is not an option, I have settled on a convention where getting a list of entities does not also get each item's child entities (no loading), but getting a single entity does (eager loading). Since getting a single entity also gets children, it causes a cascading effect in which each child then gets its children too. This sounds bad, but as long as the model is not too deep, I usually don't see performance problems that outweigh the benefits of the ease of use. So if I want to get a list in which each of the items is fully hydrated with children, I combine the GetList and GetItem methods. So I'll get a list and then loop through it getting each item with the full cascade. Even this is generally acceptable in many of the projects I've worked on - but I have recently encountered situations with larger models and/or more data in which it needs to be more efficient. I've found that partitioning the loop and executing it on multiple threads yields excellent results. In my first experiment with a list of 50 items from one particular project, I did 5 threads of 10 items each and got a 3X improvement in time. Of course, the mileage will vary depending on the project but all else being equal this is clearly a big opportunity. However, before I go further, I was wondering what others have done that have already been through this. What are some good approaches to parallelizing this type of thing?

    Read the article

  • Indexing table with duplicates MySQL/SQL Server with millions of records

    - by Tesnep
    I need help in indexing in MySQL. I have a table in MySQL with following rows: ID Store_ID Feature_ID Order_ID Viewed_Date Deal_ID IsTrial The ID is auto generated. Store_ID goes from 1 - 8. Feature_ID from 1 - let's say 100. Viewed Date is Date and time on which the data is inserted. IsTrial is either 0 or 1. You can ignore Order_ID and Deal_ID from this discussion. There are millions of data in the table and we have a reporting backend that needs to view the number of views in a certain period or overall where trial is 0 for a particular store id and for a particular feature. The query takes the form of: select count(viewed_date) from theTable where viewed_date between '2009-12-01' and '2010-12-31' and store_id = '2' and feature_id = '12' and Istrial = 0 In SQL Server you can have a filtered index to use for Istrial. Is there anything similar to this in MySQL? Also, Store_ID and Feature_ID have a lot of duplicate data. I created an index using Store_ID and Feature_ID. Although this seems to have decreased the search period, I need better improvement than this. Right now I have more than 4 million rows. To search for a particular query like the one above, it looks at 3.5 million rows in order to give me the count of 500k rows. PS. I forgot to add view_date filter in the query. Now I have done this.

    Read the article

  • Please tell me what is wrong with my threading!!!

    - by kiddo
    I have a function where I will compress a bunch of files into a single compressed file..it is taking a long time(to compress),so I tried implementing threading in my application..Say if I have 20 files for compression,I separated that as 5*4=20,inorder to do that I have separate variables(which are used for compression) for all 4 threads in order to avoid locks and I will wait until the 4 thread finishes..Now..the threads are working but i see no improvement in their performance..normally it will take 1 min for 20 files(for example) after implementing threading ...there is only 5 or 3 sec difference., sometimes the same. here i will show the code for 1 thread(so it is for other3 threads) //main thread myClassObject->thread1 = AfxBeginThread((AFX_THREADPROC)MyThreadFunction1,myClassObject); .... HANDLE threadHandles[4]; threadHandles[0] = myClassObject->thread1->m_hThread; .... WaitForSingleObject(myClassObject->thread1->m_hThread,INFINITE); UINT MyThreadFunction(LPARAM lparam) { CMerger* myClassObject = (CMerger*)lparam; CString outputPath = myClassObject->compressedFilePath.GetAt(0);//contains the o/p path wchar_t* compressInputData[] = {myClassObject->thread1outPath, COMPRESS,(wchar_t*)(LPCTSTR)(outputPath)}; HINSTANCE loadmyDll; loadmydll = LoadLibrary(myClassObject->thread1outPath); fp_Decompress callCompressAction = NULL; int getCompressResult=0; myClassObject->MyCompressFunction(compressInputData,loadClient7zdll,callCompressAction,myClassObject->thread1outPath, getCompressResult,minIndex,myClassObject->firstThread,myClassObject); return 0; }

    Read the article

  • Looking for an appropriate design pattern

    - by user1066015
    I have a game that tracks user stats after every match, such as how far they travelled, how many times they attacked, how far they fell, etc, and my current implementations looks somewhat as follows (simplified version): Class Player{ int id; public Player(){ int id = Math.random()*100000; PlayerData.players.put(id,new PlayerData()); } public void jump(){ //Logic to make the user jump //... //call the playerManager PlayerManager.jump(this); } public void attack(Player target){ //logic to attack the player //... //call the player manager PlayerManager.attack(this,target); } } Class PlayerData{ public static HashMap<int, PlayerData> players = new HashMap<int,PlayerData>(); int id; int timesJumped; int timesAttacked; } public void incrementJumped(){ timesJumped++; } public void incrementAttacked(){ timesAttacked++; } } Class PlayerManager{ public static void jump(Player player){ players.get(player.getId()).incrementJumped(); } public void incrementAttacked(Player player, Player target){ players.get(player.getId()).incrementAttacked(); } } So I have a PlayerData class which holds all of the statistics, and brings it out of the player class because it isn't part of the player logic. Then I have PlayerManager, which would be on the server, and that controls the interactions between players (a lot of the logic that does that is excluded so I could keep this simple). I put the calls to the PlayerData class in the Manager class because sometimes you have to do certain checks between players, for instance if the attack actually hits, then you increment "attackHits". The main problem (in my opinion, correct me if I'm wrong) is that this is not very extensible. I will have to touch the PlayerData class if I want to keep track of a new stat, by adding methods and fields, and then I have to potentially add more methods to my PlayerManager, so it isn't very modulized. If there is an improvement to this that you would recommend, I would be very appreciative. Thanks.

    Read the article

  • Thread-Safe lazy instantiating using MEF

    - by Xaqron
    // Member Variable private static readonly object _syncLock = new object(); // Now inside a static method foreach (var lazyObject in plugins) { if ((string)lazyObject.Metadata["key"] = "something") { lock (_syncLock) { // It seems the `IsValueCreated` is not up-to-date if (!lazyObject.IsValueCreated) lazyObject.value.DoSomething(); } return lazyObject.value; } } Here I need synchronized access per loop. There are many threads iterating this loop and based on the key they are looking for, a lazy instance is created and returned. lazyObject should not be created more that one time. Although Lazy class is for doing so and despite of the used lock, under high threading I have more than one instance created (I track this with a Interlocked.Increment on a volatile static int and log it somewhere). The problem is I don't have access to definition of Lazy and MEF defines how the Lazy class create objects. I should notice the CompositionContainer has a thread-safe option in constructor which is already used. My questions: 1) Why the lock doesn't work ? 2) Should I use an array of locks instead of one lock for performance improvement ?

    Read the article

  • Oracle Query Optimization: Why is My Second Query Faster?

    - by Patrick Cuff
    I was having some performance issues with an Oracle query, so I downloaded a trial of the Quest SQL Optimizer for Oracle, which made some changes that dramatically improved the query's performance. I'm not exactly sure why the recommended query had such an improvement; can anyone provide an explanation? Before: SELECT t1.version_id, t1.id, t2.field1, t3.person_id, t2.id FROM table1 t1, table2 t2, table3 t3 WHERE t1.id = t2.id AND t1.version_id = t2.version_id AND t2.id = 123 AND t1.version_id = t3.version_id AND t1.VERSION_NAME <> 'AA' order by t1.id Plan Cost: 831 Elapsed Time: 00:00:21.40 Number of Records: 40,717 After: SELECT /*+ USE_NL_WITH_INDEX(t1) */ t1.version_id, t1.id, t2.field1, t3.person_id, t2.id FROM table2 t2, table3 t3, table1 t1 WHERE t1.id = t2.id + 0 AND t1.version_id = t2.version_id + 0 AND t2.id = 123 AND t1.version_id = t3.version_id + 0 AND t1.VERSION_NAME || '' <> 'AA' AND t3.version_id = t2.version_id + 0 order by t1.id Plan Cost: 686 Elapsed Time: 00:00:00.95 Number of Records: 40,717 Questions: Why does re-arranging the order of the tables in the FROM clause help? Why does adding + 0 to the WHERE clause comparisons help? Why does || '' <> 'AA' in the WHERE clause VERSION_NAME comparison help? Is this a more efficient way of handling possible nulls on this column?

    Read the article

  • Skip subdirectory in python import

    - by jstaab
    Ok, so I'm trying to change this: app/ - lib.py - models.py - blah.py Into this: app/ - __init__.py - lib.py - models/ - __init__.py - user.py - account.py - banana.py - blah.py And still be able to import my models using from app.models import User rather than having to change it to from app.models.user import User all over the place. Basically, I want everything to treat the package as a single module, but be able to navigate the code in separate files for development ease. The reason I can't do something like add for file in __all__: from file import * into init.py is I have circular references between the model files. A fix I don't want is to import those models from within the functions that use them. But that's super ugly. Let me give you an example: user.py ... from app.models import Banana ... banana.py ... from app.models import User ... I wrote a quick pre-processing script that grabs all the files, re-writes them to put imports at the top, and puts it into models.py, but that's hardly an improvement, since now my stack traces don't show the line number I actually need to change. Any ideas? I always though init was probably magical but now that I dig into it, I can't find anything that lets me provide myself this really simple convenience.

    Read the article

  • (php) regexto remove comments but ignore occurances within strings

    - by David
    Hi there, I am writing a comment-stripper and trying to accommodate for all needs here. I have the below stack of code which removes pretty much all comments, but it actually goes too far. A lot of time was spent trying and testing and researching the regex patterns to match, but I don't claim that they are the best at each. My problem is that I also have situation where I have 'PHP comments' (that aren't really comments' in standard code, or even in PHP strings, that I don't actually want to have removed. Example: <?php $Var = "Blah blah //this must not comment"; // this must comment. ?> What ends up happening is that it strips out religiously, which is fine, but it leaves certain problems: <?php $Var = "Blah blah ?> Also: will also cause problems, as the comment removes the rest of the line, including the ending ? See the problem? So this is what I need... Comment characters within '' or "" need to be ignored PHP Comments on the same line, that use double-slashes, should remove perhaps only the comment itself, or should remove the entire php codeblock. Here's the patterns I use at the moment, feel free to tell me if there's improvement I can make in my existing patterns? :) $CompressedData = $OriginalData; $CompressedData = preg_replace('!/\*.*?\*/!s', '', $CompressedData); // removes /* comments */ $CompressedData = preg_replace('!//.*?\n!', '', $CompressedData); // removes //comments $CompressedData = preg_replace('!#.*?\n!', '', $CompressedData); // removes # comments $CompressedData = preg_replace('/<!--(.*?)-->/', '', $CompressedData); // removes HTML comments Any help that you can give me would be greatly appreciated! :)

    Read the article

  • Find existence of number in a sorted list in constant time? (Interview question)

    - by Rich
    I'm studying for upcoming interviews and have encountered this question several times (written verbatim) Find or determine non existence of a number in a sorted list of N numbers where the numbers range over M, M N and N large enough to span multiple disks. Algorithm to beat O(log n); bonus points for constant time algorithm. First of all, I'm not sure if this is a question with a real solution. My colleagues and I have mused over this problem for weeks and it seems ill formed (of course, just because we can't think of a solution doesn't mean there isn't one). A few questions I would have asked the interviewer are: Are there repeats in the sorted list? What's the relationship to the number of disks and N? One approach I considered was to binary search the min/max of each disk to determine the disk that should hold that number, if it exists, then binary search on the disk itself. Of course this is only an order of magnitude speedup if the number of disks is large and you also have a sorted list of disks. I think this would yield some sort of O(log log n) time. As for the M N hint, perhaps if you know how many numbers are on a disk and what the range is, you could use the pigeonhole principle to rule out some cases some of the time, but I can't figure out an order of magnitude improvement. Also, "bonus points for constant time algorithm" makes me a bit suspicious. Any thoughts, solutions, or relevant history of this problem?

    Read the article

  • Best available technology for layered disk cache in linux

    - by SpliFF
    I've just bought a 6-core Phenom with 16G of RAM. I use it primarily for compiling and video encoding (and occassional web/db). I'm finding all activities get disk-bound and I just can't keep all 6 cores fed. I'm buying an SSD raid to sit between the HDD and tmpfs. I want to setup a "layered" filesystem where reads are cached on tmpfs but writes safely go through to the SSD. I want files (or blocks) that haven't been read lately on the SSD to then be written back to a HDD using a compressed FS or block layer. So basically reads: - Check tmpfs - Check SSD - Check HD And writes: - Straight to SSD (for safety), then tmpfs (for speed) And periodically, or when space gets low: - Move least frequently accessed files down one layer. I've seen a few projects of interest. CacheFS, cachefsd, bcache seem pretty close but I'm having trouble determining which are practical. bcache seems a little risky (early adoption), cachefs seems tied to specific network filesystems. There are "union" projects unionfs and aufs that let you mount filesystems over each other (USB device over a DVD usually) but both are distributed as a patch and I get the impression this sort of "transparent" mounting was going to become a kernel feature rather than a FS. I know the kernel has a built-in disk cache but it doesn't seem to work well with compiling. I see a 20x speed improvement when I move my source files to tmpfs. I think it's because the standard buffers are dedicated to a specific process and compiling creates and destroys thousands of processes during a build (just guessing there). It looks like I really want those files precached. I've read tmpfs can use virtual memory. In that case is it practical to create a giant tmpfs with swap on the SSD? I don't need to boot off the resulting layered filesystem. I can load grub, kernel and initrd from elsewhere if needed. So that's the background. The question has several components I guess: Recommended FS and/or block layer for the SSD and compressed HDD. Recommended mkfs parameters (block size, options etc...) Recommended cache/mount technology to bind the layers transparently Required mount parameters Required kernel options / patches, etc..

    Read the article

  • New Computer Build Questions

    - by MJ
    I'm in the process of gathering parts and specs for a new machine. I wear many hats, so the machine needs to do a lot. I need at least 2 monitor support, if not three. I also play many online MMOs (wow, aion, war hammer, etc), along with some freelance programming projects. I already have a case which is very large, so it will fit anything. I have 2 other SATA HDs. They are more for storage and basic programs. I feel that the best improvement could be done with a solid state HD, true or not? I'm more of a software/programming guy, so ANY input at all on improving this system build would be appreciated. I have a few questions with this list. AMD or Intel? I don't know enough about either to choose what would best fit me. Thanks! **EDIT: Thanks for the input everyone! Here are some answers: I do a lot of programming and gaming, so I do need things for both. The newer video card covers the gaming aspect, as well as allowing me to have many monitors. (hopefully upgrade to dual 30' or more) I don't need any additional HDs at this time. I have a SATA 160g and 120g from my previous computer, and a NAS system with over 2TB of storage on the homenetwork. I just wanted a fast HD for OS/programs/games. With the memory. I have used G.SKILL before in 2 system builds. It's done excellent for me in them. Very stable. **EDIT2: Made some additional changes. Lowered the power supply down to 750, which saves me more $$. Also changed the SSD to 2 WD 650G HDs. Thinking of doing a CPU upgrade to the 3.4GHZ AMD Phenom II X4 965 Black Edition Deneb 3.4GHz System Specs - Budget:$1500 CPU: AMD Phenom II X4 955 Black Edition Deneb 3.2GHz MB: GIGABYTE GA-MA790GPT-UD3H AM3 AMD 790GX HDMI ATX Memory: G.SKILL 4GB (2 x 2GB) 240-Pin DDR3 SDRAM DDR3 1333 (PC3 1066 Video: DIAMOND 5870PE51G Radeon HD 5870 (Cypress XT) 1GB 256-bit GD Power Supply: XCLIO GREATPOWER 1000W ATX12V SLI Ready CrossFire Ready HD:Intel X25-M Mainstream SSDSA2MH080G2C1 2.5" 80GB SATA II MLC Changes: Power Supply: CORSAIR CMPSU-750TX 750W ATX12V / EPS12V HD: 2x Western Digital Caviar Blue WD6400AAKS 640GB CPU: AMD Phenom II X4 965 Black Edition Deneb 3.4GHz

    Read the article

< Previous Page | 394 395 396 397 398 399 400 401 402 403 404 405  | Next Page >