Search Results

Search found 19555 results on 783 pages for 'job performance'.

Page 356/783 | < Previous Page | 352 353 354 355 356 357 358 359 360 361 362 363  | Next Page >

  • Product Catalog Schema design

    - by FlySwat
    I'm building a proof of concept schema for a product catalog to possibly replace a very aging and crufty one we use. In our business, we sell both physical materials and services (one time and reoccurring charges). The current catalog schema has each distinct category broken out into individual tables, while this is nicely normalized and performs well, it is fairly difficult to extend. Adding a new attribute to a particular product involves changing the table schema and backpopulating old data. An idea I've been toying with has been something along the line of a base set of entity tables in 3rd normal form, these will contain the facts that are common among ALL products. Then, I'd like to build an Attribute-Entity-Value schema that allows each entity type to be extended in a flexible way using just data and no schema changes. Finally, I'd like to denormalize this data model into materialized views for each individual entity type. This views are what the application would access. We also have many tables that contain business rules and compatibility rules. These would join against the base entity tables instead of the views. My big concerns here are: Performance - Attribute-Entity-Value schemas are flexible, but typically perform poorly, should I be concerned? More Performance - Denormalizing using materialized views may have some risks, I'm not positive on this yet. Complexity - While this schema is flexible and maintainable using just data, I worry that the complexity of the design might make future schema changes difficult. For those who have designed product catalogs for large scale enterprises, am I going down the totally wrong path? Is there any good best practice schema design reading available for product catalogs?

    Read the article

  • Can I write this regex in one step?

    - by Marin Doric
    This is the input string "23x +y-34 x + y+21x - 3y2-3x-y+2". I want to surround every '+' and '-' character with whitespaces but only if they are not allready sourrounded from left or right side. So my input string would look like this "23x + y - 34 x + y + 21x - 3y2 - 3x - y + 2". I wrote this code that does the job: Regex reg1 = new Regex(@"\+(?! )|\-(?! )"); input = reg1.Replace(input, delegate(Match m) { return m.Value + " "; }); Regex reg2 = new Regex(@"(?<! )\+|(?<! )\-"); input = reg2.Replace(input, delegate(Match m) { return " " + m.Value; }); explanation: reg1 // Match '+' followed by any character not ' ' (whitespace) or same thing for '-' reg2 // Same thing only that I match '+' or '-' not preceding by ' '(whitespace) delegate 1 and 2 just insert " " before and after m.Value ( match value ) Question is, is there a way to create just one regex and just one delegate? i.e. do this job in one step? I am a new to regex and I want to learn efficient way.

    Read the article

  • C++ .NET DLL vs C# Managed Code ? (File Encrypting AES-128+XTS)

    - by Ranhiru
    I need to create a Windows Mobile Application (WinMo 6.x - C#) which is used to encrypt/decrypt files. However it is my duty to write the encryption algorithm which is AES-128 along with XTS as the mode of operation. RijndaelManaged just doesn't cut it :( Very much slower than DES and 3DES CryptoServiceProviders :O I know it all depends on how good I am at writing the algorithm in the most efficient way. (And yes I my self have to write it from scratch but i can take a look @ other implementations) Nevertheless, does writing a C++ .NET DLL to create the encryption/decryption algorithm + all the file handling and using it from C# have a significant performance advantage OVER writing the encryption algorithm + file handling in completely managed C# code? If I use C++ .NET to create the encryption algorithm, should I use MFC Smart Device DLL or ATL? What is the difference and is there any impact on which one I choose? And can i just add a reference to the C++ DLL from C# or should I use P/Invoke? I am fairly competent with C# than C++ but performance plays a major role as I have convinced my lecturers that AES is a very efficient cryptographic algorithm for resource constrained devices. Thanx a bunch :)

    Read the article

  • WinForms - How do I access/call methods in UI thread from a separate thread without passing a delega

    - by Greg
    Hi, QUESTION: In .NET 3.5 WinForms apps, how do I access/call methods in UI thread from a separate thread, without passing a delegate? EXAMPLE: Say I have some code I want to run both (a) manually when the user clicks a button, and (b) periodically called by a process which is running in a separate non-mainUI thread but without passing a delegate. [Simplistically I'm thinking that the class that has this method is already been constructed, and the main UI thread has a handle to it, therefore if the process running in the separate thread could just get a handle to it from the main-UI thread it could call it. Hopefully this is not a flawed concept] BACKGROUND: I'm actually after a way to do the above for the case where my separate process thread is actually a job I schedule using quartz.net. The way the scheduler works I can't seem to actually pass in a delegate. There is a way to pass JobDetails, however it only seems to caters for things like string, int, etc. Hence what I'm after is a way to access the MainForm class for example, to call a method on it, from within the quartz.net job which runs in a separate thread. Thanks

    Read the article

  • shell scripting: search/replace & check file exist

    - by johndashen
    I have a perl script (or any executable) E which will take a file foo.xml and write a file foo.txt. I use a Beowulf cluster to run E for a large number of XML files, but I'd like to write a simple job server script in shell (bash) which doesn't overwrite existing txt files. I'm currently doing something like #!/bin/sh PATTERN="[A-Z]*0[1-2][a-j]"; # this matches foo in all cases todo=`ls *.xml | grep $PATTERN -o`; isdone=`ls *.txt | grep $PATTERN -o`; whatsleft=todo - isdone; # what's the unix magic? #tack on the .xml prefix with sed or something #and then call the job server; jobserve E "$whatsleft"; and then I don't know how to get the difference between $todo and $isdone. I'd prefer using sort/uniq to something like a for loop with grep inside, but I'm not sure how to do it (pipes? temporary files?) As a bonus question, is there a way to do lookahead search in bash grep? To clarify: so the simplest way to do what i'm asking is (in pseudocode) for i in `/bin/ls *.xml` do replace xml suffix with txt if [that file exists] add to whatsleft list end done

    Read the article

  • Data in linux FIFO seems lost

    - by Utoah
    Hi, I have a bash script which wants to do some work in parallel, I did this by putting each job in an subshell which is run in the background. While the number of job running simultaneously should under some limit, I achieve this by first put some lines in a FIFO, then just before forking the subshell, the parent script is required to read a line from this FIFO. Only after it gets a line can it fork the subshell. Up to now, everything works fine. But when I tried to read a line from the FIFO in the subshell, it seems that only one subshell can get a line, even if there are apparently more lines in the FIFO. So I wonder why cannot other subshell(s) read a line even when there are more lines in the FIFO. My testing code looks something like this: #!/bin/sh fifo_path="/tmp/fy_u_test2.fifo" mkfifo $fifo_path #open fifo for r/w at fd 6 exec 6 $fifo_path process_num=5 #put $process_num lines in the FIFO for ((i=0; i<${process_num}; i++)); do echo "$i" done &6 delay_some(){ local index="$1" echo "This is what u can see. $index \n" sleep 20; } #In each iteration, try to read 2 lines from FIFO, one from this shell, #the other from the subshell for i in 1 2 do date /tmp/fy_date #If a line can be read from FIFO, run a subshell in bk, otherwise, block. read -u6 echo " $$ Read --- $REPLY --- from 6 \n" /tmp/fy_date { delay_some $i #Try to read a line from FIFO read -u6 echo " $$ This is in child # $i, read --- $REPLY --- from 6 \n" /tmp/fy_date } & done And the output file /tmp/fy_date has content of: Mon Apr 26 16:02:18 CST 2010 32561 Read --- 0 --- from 6 \n Mon Apr 26 16:02:18 CST 2010 32561 Read --- 1 --- from 6 \n 32561 This is in child # 1, read --- 2 --- from 6 \n

    Read the article

  • Google Code Jam 2010 Large DataSets Take Too Long to Submit

    - by Travis
    Hey Guys, I'm participating in the 2010 code jam and I solved two of the problems for the small data sets, but I'm not even close to solving the large data sets in the 8 minute time frame. I'm wondering if anyone out there has solved the large data set: What hardware were you running on? What language were you running on? What performance tuning techniques did you do on your code to run as fast as possible? I'm writing the solutions in Ruby, which is not my day to day language, and executing them on my Macbook Pro. My solutions for problem A and problem C are on github at http://github.com/tjboudreaux/codejam2010. I'd appreciate any suggestions that you may have. FWIW, I have alot of experience in C++ from college, my primary language is PHP, and my "sandbox" language is Ruby. Was I just a bit ambitious by taking a shot at this in Ruby, not knowing where the language struggles for performance, or does anyone see anything that's a redflag as to why I can't complete the large dataset in time to submit.

    Read the article

  • "Replay" the steps needed to recreate an error

    - by David
    I am going to create a typical business application that will be used by a few hundred consultants. Normally, the consultants would be presented with an error message with a standard text. As the application will be a complicated one with lots of changes being made to it constantly I would like the following: When an error message is presented, the user has the option to "send" the error message to the developers. The developers should be able to open the incoming file in i.e. Eclipse and debug the steps of the last 10 minutes of work step by step (one line at a time if they want to). Everything should be transparent, meaning that they for example should be able to see the return values of calls to the database. Are there any solutions that offer such functionality today, my preferred language is Python or also Java. I know that there will be a huge performance hit because of such functionality, but that is acceptable as this kind of software is not performance sensitive. It would be VERY nice if the database also had a cronology so that one could query the database for values that existed at the exact time that a specific line of code was run in the application, leading up to the bug.

    Read the article

  • Attributes of attributevalue element in SAML 2 Attribute Statement

    - by AJ
    I am building a web service that receives a SAML attribute query and responds with an attribute statement. I know I can return one or multiple values of a SAML attribute. I have some values that are dependent on the other attribute values. I need to show that relationship. Let us say, the query is for the Subject Dave and the return values are his company and job title. Dave can work at multiple companies with job title at each company. I have two options of sending this data back: Send this as a complextype by defining an attribute organization and return xml within that attribute. <saml:Attribute name="company"> <saml:AttributeValue> <company name="company1" jobtitle="CIO"/> <company name="company2" jobtitle="VP"/> </saml:AttributeValue> Try to send multiple values of attributes somehow sending a reference in attributevalue element. <saml:Attribute name="company"> <attributeValue>company1</attributeValue> <attributeValue>company2</attributeValue> </saml:Attribute> <saml:Attribute name="jobTitle> <attributeValue company="company1">CIO</attributeValue> <attributeValue company="company2">VP</attributeValue> </saml:Attribute> Which approach will you prefer? Why? I am biased towards second approach as it does not require client to know about any schema. It does require them to know about non-standard attribute company in the attribute value.

    Read the article

  • Does anyone know of a good Commercial WPF Web Browser Control?

    - by VoidDweller
    I have an MDI WPF app that I need to add web content too. At first, great it looks like I have 2 options built into the framework the Frame control and the WebBrowser control. Given that this is an MDI app it doesn't take long to discover that neither of these will work. The WebBrowser control wraps up the IE WebBrowser ActiveX Control which uses the Win32 graphics pipeline. The "Airspace" issue pretty much sums this up as "Sorry, the layouts will not play nice together". Yes, I have thought about taking snapshots of the web content rendering these and mapping the mouse and keyboard events back to the browser control, but I can't afford the performance penalty and I really don't have time to write and thoroughly test it. I have looked for third party controls, but so far I have only found Chris Cavanagh's WPF Chromium Web Browser control. Which wraps up Awesomium 1.5. Together these are very cool, they play nice with the WPF layouts. But they do not meet my performance requirements. They are VERY HEAVY on memory consumption and not to friendly with CPU usage either. Not to mention still quite buggy. I'll elaborate if you are interested. So, do any of you know of a stable performant WPF web browser control? Thanks.

    Read the article

  • sql server 2005 stored procedure unexpected behaviour

    - by user283405
    i have written a simple stored procedure (run as job) that checks user subscribe keyword alerts. when article posted the stored procedure sends email to those users if the subscribed keyword matched with article title. One section of my stored procedure is: OPEN @getInputBuffer FETCH NEXT FROM @getInputBuffer INTO @String WHILE @@FETCH_STATUS = 0 BEGIN --PRINT @String INSERT INTO #Temp(ArticleID,UserID) SELECT A.ID,@UserID FROM CONTAINSTABLE(Question,(Text),@String) QQ JOIN Article A WITH (NOLOCK) ON A.ID = QQ.[Key] WHERE A.ID > @ArticleID FETCH NEXT FROM @getInputBuffer INTO @String END CLOSE @getInputBuffer DEALLOCATE @getInputBuffer This job run every 5 minute and it checks last 50 articles. It was working fine for last 3 months but a week before it behaved unexpectedly. The problem is that it sends irrelevant results. The @String contains user alert keyword and it matches to the latest articles using Full text search. The normal execution time is 3 minutes but its execution time is 3 days (in problem). Now the current status is its working fine but we are unable to find any reason why it sent irrelevant results. Note: I am already removing noise words from user alert keyword. I am using SQL Server 2005 Enterprise Edition.

    Read the article

  • asynchronous pages

    - by lockedscope
    I have just read the multi-threading and custom threading in asp.net articles. http://www.williablog.net/williablog/post/2008/12/16/Custom-Threading-in-ASPNET.aspx http://www.williablog.net/williablog/post/2008/12/16/Multi-Threading-in-ASPNET.aspx I have couple of questions. What does he mean by returning a thread to the pool? Is that thread completely removed from memory or put in to a state that it does not scheduled to CPU(is it in sleep state or whatever)? If that thread is removed from memory how could it survive after async point? How this mechanism works? Are every objects(pages class, request,response etc.) are copied to somewhere else before they are disposed? (Or, is it just waiting in a sleep state and then its waked when async call ends?) He is saying that; "Having said that, making pages asynchronous is not really about improving performance, it is about improving scalability" then he is saying; "I'm sorry to say that it will do nothing for scalability or performance." So which one is true? or for which case(s) are they true?

    Read the article

  • SQL Server: Is it possible to prevent SQL Agent from failing a step on error?

    - by Kenneth
    I have a stored procedure that runs custom backups for around 60 SQL servers (mixes 2000 through 2008R2). Occasionally, due to issues outside of my control (backup device inaccessible, network error, etc.) an individual backup on one or two databases will fail. This causes this entire step to fail, which means any subsequent backup commands are not executed and half of the databases on a given server may not be backed up. On the 2005+ boxes I am using TRY/CATCH blocks to manage these problems and continue backing up the remaining databases. On a 2000 server however, for example, I have no way to prevent this error from failing the entire step: Msg 3201, Level 16, State 1, Line 1 Cannot open backup device 'db-diff(\PATH\DB-DIFF-03-16-2010.DIF)'. Operating system error 5(Access is denied.). Msg 3013, Level 16, State 1, Line 1 BACKUP DATABASE is terminating abnormally. I am simply asking if anything like TRY/CATCH is possible in SQL 2000? I realize there are no built in methods for this, so I guess I am looking for some creativity. Even when wrapping each backup (or any failing statement) via sp_executesql the job fails instantly. Example: DECLARE @x INT, @iReturn INT PRINT 'Executing statement that will fail with 208.' EXEC @iReturn = Sp_executesql N'SELECT * from TABLETHATDOESNTEXIST;' PRINT Cast(@iReturn AS NVARCHAR) --In SSMS this return code prints. Executed as a job it fails and aborts before this statement.

    Read the article

  • Why is my logic not working correctly for SPOJ TOPOSORT?

    - by Kavish Dwivedi
    The given problem is http://www.spoj.com/problems/TOPOSORT/ The output format is particularly important as : Print "Sandro fails." if Sandro cannot complete all his duties on the list. If there is a solution print the correct ordering, the jobs to be done separated by a whitespace. If there are multiple solutions print the one, whose first number is smallest, if there are still multiple solutions, print the one whose second number is smallest, and so on. What I am doing is simply doing dfs by reversing the edges i.e if job A finishes before job B, there is a directed edge from B to A . I am maintaining the order by sorting the adjacency list I created and storing the nodes which don't have any constraints separately so as to print them later in correct order . There are two flag arrays used , one for marking discovered node and one for marking the node whose all neighbors have been explored. Now my solution is http://www.ideone.com/QCUmKY (the important function is the visit funtion ) and its giving WA after running correct for 10 cases so its really hard to figure out where am I doing it wrong since it runs for all of the test cases which I have done by hand.

    Read the article

  • PostgreSQL: BYTEA vs OID+Large Object?

    - by mlaverd
    I started an application with Hibernate 3.2 and PostgreSQL 8.4. I have some byte[] fields that were mapped as @Basic (= PG bytea) and others that got mapped as @Lob (=PG Large Object). Why the inconsistency? Because I was a Hibernate noob. Now, those fields are max 4 Kb (but average is 2-3 kb). The PostgreSQL documentation mentioned that the LOs are good when the fields are big, but I didn't see what 'big' meant. I have upgraded to PostgreSQL 9.0 with Hibernate 3.6 and I was stuck to change the annotation to @Type(type="org.hibernate.type.PrimitiveByteArrayBlobType"). This bug has brought forward a potential compatibility issue, and I eventually found out that Large Objects are a pain to deal with, compared to a normal field. So I am thinking of changing all of it to bytea. But I am concerned that bytea fields are encoded in Hex, so there is some overhead in encoding and decoding, and this would hurt the performance. Are there good benchmarks about the performance of both of these? Anybody has made the switch and saw a difference?

    Read the article

  • Multi-tenant Access Control: Repository or Service layer?

    - by FreshCode
    In a multi-tenant ASP.NET MVC application based on Rob Conery's MVC Storefront, should I be filtering the tenant's data in the repository or the service layer? 1. Filter tenant's data in the repository: public interface IJobRepository { IQueryable<Job> GetJobs(short tenantId); } 2. Let the service filter the repository data by tenant: public interface IJobService { IList<Job> GetJobs(short tenantId); } My gut-feeling says to do it in the service layer (option 2), but it could be argued that each tenant should in essence have their own "virtual repository," (option 1) where this responsibility lies with the repository. Which is the most elegant approach: option 1, option 2 or is there a better way? Update: I tried the proposed idea of filtering at the repository, but the problem is that my application provides the tenant context (via sub-domain) and only interacts with the service layer. Passing the context all the way to the repository layer is a mission. So instead I have opted to filter my data at the service layer. I feel that the repository should represent all data physically available in the repository with appropriate filters for retrieving tenant-specific data, to be used by the service layer. Final Update: I ended up abandoning this approach due to the unnecessary complexities. See my answer below.

    Read the article

  • Cassandra random read speed

    - by Jody Powlette
    We're still evaluating Cassandra for our data store. As a very simple test, I inserted a value for 4 columns into the Keyspace1/Standard1 column family on my local machine amounting to about 100 bytes of data. Then I read it back as fast as I could by row key. I can read it back at 160,000/second. Great. Then I put in a million similar records all with keys in the form of X.Y where X in (1..10) and Y in (1..100,000) and I queried for a random record. Performance fell to 26,000 queries per second. This is still well above the number of queries we need to support (about 1,500/sec) Finally I put ten million records in from 1.1 up through 10.1000000 and randomly queried for one of the 10 million records. Performance is abysmal at 60 queries per second and my disk is thrashing around like crazy. I also verified that if I ask for a subset of the data, say the 1,000 records between 3,000,000 and 3,001,000, it returns slowly at first and then as they cache, it speeds right up to 20,000 queries per second and my disk stops going crazy. I've read all over that people are storing billions of records in Cassandra and fetching them at 5-6k per second, but I can't get anywhere near that with only 10mil records. Any idea what I'm doing wrong? Is there some setting I need to change from the defaults? I'm on an overclocked Core i7 box with 6gigs of ram so I don't think it's the machine. Here's my code to fetch records which I'm spawning into 8 threads to ask for one value from one column via row key: ColumnPath cp = new ColumnPath(); cp.Column_family = "Standard1"; cp.Column = utf8Encoding.GetBytes("site"); string key = (1+sRand.Next(9)) + "." + (1+sRand.Next(1000000)); ColumnOrSuperColumn logline = client.get("Keyspace1", key, cp, ConsistencyLevel.ONE); Thanks for any insights

    Read the article

  • yet another question about migrating to Java

    - by aloneguid
    Hi, There are plenty similar questions, but maybe responses to this one will save a developer's life :) I want to migrate to Java. The reasons are very clear: all the .NET vacancies are client and windows oriented (Silverlight developer, ASP.NET developer, WPF developer etc.) and none of them are any interest to me. I worked with .NET since it's beginning as our company decided to invest in .NET having C++ stack and all the natual problems, so I was just blindly following and actually enjoyed it as the products were mostly server oriented with mixed C++/C# code. Today I have beforementioned problem - can't find an inspiring job. I'd rather kill myself than start working on a Silverlight or WPF project. Searching Java vacancies shows promising results, however they all require a huge java-related technology stack and experience. The question is is there any chance to find a job quickly and without dramatic salary drop (I know that Java guys are usually better paid, so there must be a kind of a credit) and if not, how must time and effort does it take to migrate (my .NET knowledge mostly includes server-oriented technologies like NHibernate, WCF, threading, sockets, ASP.NET web services, Enterprise Library, NInject etc etc etc, and (still) some C++ leftovers). Thanks!

    Read the article

  • MongoDB architectural question

    - by pex
    I have to store 4 Models. Let's say a Post that has many and belongs to many Categories. Category on the other hand has many Qualities. At the moment I'm of the opinion, that Post and Categories are Documents. Qualities becomes an EmbeddedDocument of Categories. We're coming to the root problem: There are a lot of Votes on Qualities that belong to a Post. I thought about embed Votes in Post and give it a quality_id. I am really expecting a lot of Votes and there has to be a possibility to filter them (e.g by Username / Usergroup / Date voted). I worked with MongoMapper and I think the missing existence of find methods for EmbeddedDocuments could become a killer. On the other hand I'm wondering about performance issues. What if I want to provide a Post without all the Votes, but only a few. Or, what if I define an own Document for Votes and have tons of Vote-Documents? Wouldn't that become a performance killer?

    Read the article

  • Blocking on DBCP connection pool (open and close connnection). Is database connection pooling in OpenEJB pluggable?

    - by topchef
    We use OpenEJB on Tomcat (used to run on JBoss, Weblogic, etc.). While running load tests we experience significant performance problems with handling JMS messages (queues). Problem was localized to blocking on database connection pool getting or releasing connection to the pool. Blocking prevented concurrent MDB instances (threads) from running hence performance suffered 10-fold and worse. The same code used to run on application servers (with their respective connection pool implementations) with no blocking at all. Example of thread blocked: Name: JMS Resource Adapter-worker-23 State: BLOCKED on org.apache.commons.pool.impl.GenericObjectPool@1ea6b4a owned by: JMS Resource Adapter-worker-19 Total blocked: 18,426 Total waited: 0 Stack trace: org.apache.commons.pool.impl.GenericObjectPool.returnObject(GenericObjectPool.java:916) org.apache.commons.dbcp.PoolableConnection.close(PoolableConnection.java:91) - locked org.apache.commons.dbcp.PoolableConnection@1bcba8 org.apache.commons.dbcp.managed.ManagedConnection.close(ManagedConnection.java:147) com.xxxxx.persistence.DbHelper.closeConnection(DbHelper.java:290) .... Couple of questions. I am almost certain that some transactional attributes and properties contribute to this blocking, but MDBs are defined as non-transactional (we use both annotations and ejb-jar.xml). Some EJBs do use container-managed transactions though (and we can observe blocking there as well). Are there any DBCP configurations that may fix blocking? Is DBCP connection pool implementation replaceable in OpenEJB? How easy (difficult) to replace it with another library? Just in case this is how we define data source in OpenEJB (openejb.xml): <Resource id="MyDataSource" type="DataSource"> JdbcDriver oracle.jdbc.driver.OracleDriver JdbcUrl ${oracle.jdbc} UserName ${oracle.user} Password ${oracle.password} JtaManaged true InitialSize 5 MaxActive 30 ValidationQuery SELECT 1 FROM DUAL TestOnBorrow true </Resource>

    Read the article

  • Best architecture for a social media app

    - by Sky
    Hey guys, Im working on promising project that develops a new social media app for web and mobile. We are at begin defining functionalities. Nevertheless, I'm thinking ahead on architecture. So I'm asking: 1 - Whats the best plataform to develop the core of this aplication that will have a Rest API interface. 2 - Whats the best database that will scale and grow with my application. As far as I researched, these were the answers I found most interesting: For database: Cassandra NoSQL DB, amazing scalabilty, amazing write performance, good read performance (will be improved on 0.6). I think i will choose that one. Zookeer for transactions on Cassandra. I think that 2 technologies rly good for that propose. What do you think guys? On the front end that will serve the REST API, i dont have a final candidate. For this one i have questions based on Perfomance X Scalabilty X Fast Development/Maintenance. Java or .Net As far as I researched, brings the best balance of this requisits. Python, pearl and Rail, has the best (Fast Development/Maintenance), but sux on all other. C or C++ I dont even consider, because its (Fast Development/Maintenance) sux... So what do you guy think about it?

    Read the article

  • What's the fastest way to bulk insert a lot of data in SQL Server (C# client)

    - by Andrew
    I am hitting some performance bottlenecks with my C# client inserting bulk data into a SQL Server 2005 database and I'm looking for ways in which to speed up the process. I am already using the SqlClient.SqlBulkCopy (which is based on TDS) to speed up the data transfer across the wire which helped a lot, but I'm still looking for more. I have a simple table that looks like this: CREATE TABLE [BulkData]( [ContainerId] [int] NOT NULL, [BinId] [smallint] NOT NULL, [Sequence] [smallint] NOT NULL, [ItemId] [int] NOT NULL, [Left] [smallint] NOT NULL, [Top] [smallint] NOT NULL, [Right] [smallint] NOT NULL, [Bottom] [smallint] NOT NULL, CONSTRAINT [PKBulkData] PRIMARY KEY CLUSTERED ( [ContainerIdId] ASC, [BinId] ASC, [Sequence] ASC )) I'm inserting data in chunks that average about 300 rows where ContainerId and BinId are constant in each chunk and the Sequence value is 0-n and the values are pre-sorted based on the primary key. The %Disk time performance counter spends a lot of time at 100% so it is clear that disk IO is the main issue but the speeds I'm getting are several orders of magnitude below a raw file copy. Does it help any if I: Drop the Primary key while I am doing the inserting and recreate it later Do inserts into a temporary table with the same schema and periodically transfer them into the main table to keep the size of the table where insertions are happening small Anything else? -- Based on the responses I have gotten, let me clarify a little bit: Portman: I'm using a clustered index because when the data is all imported I will need to access data sequentially in that order. I don't particularly need the index to be there while importing the data. Is there any advantage to having a nonclustered PK index while doing the inserts as opposed to dropping the constraint entirely for import? Chopeen: The data is being generated remotely on many other machines (my SQL server can only handle about 10 currently, but I would love to be able to add more). It's not practical to run the entire process on the local machine because it would then have to process 50 times as much input data to generate the output. Jason: I am not doing any concurrent queries against the table during the import process, I will try dropping the primary key and see if that helps. ~ Andrew

    Read the article

  • Help with Neuroph neural network

    - by user359708
    For my graduate research I am creating a neural network that trains to recognize images. I am going much more complex than just taking a grid of RGB values, downsampling, and and sending them to the input of the network, like many examples do. I actually use over 100 independently trained neural networks that detect features, such as lines, shading patterns, etc. Much more like the human eye, and it works really well so far! The problem is I have quite a bit of training data. I show it over 100 examples of what a car looks like. Then 100 examples of what a person looks like. Then over 100 of what a dog looks like, etc. This is quite a bit of training data! Currently I am running at about one week to train the network. This is kind of killing my progress, as I need to adjust and retrain. I am using Neuroph, as the low-level neural network API. I am running a dual-quadcore machine(16 cores with hyperthreading), so this should be fast. My processor percent is at only 5%. Are there any tricks on Neuroph performance? Or Java peroformance in general? Suggestions? I am a cognitive psych doctoral student, and I am decent as a programmer, but do not know a great deal about performance programming.

    Read the article

  • SQL GUID Vs Integer

    - by Dal
    Hi I have recently started a new job and noticed that all the SQL tables use the GUID data type for the primary key. In my previous job we used integers (Auto-Increment) for the primary key and it was a lot more easier to work with in my opinion. For example, say you had two related tables; Product and ProductType - I could easily cross check the 'ProductTypeID' column of both tables for a particular row to quickly map the data in my head because its easy to store the number (2,4,45 etc) as opposed to (E75B92A3-3299-4407-A913-C5CA196B3CAB). The extra frustration comes from me wanting to understand how the tables are related, sadly there is no Database diagram :( A lot of people say that GUID's are better because you can define the unique identifer in your C# code for example using NewID() without requiring SQL SERVER to do it - this also allows you to know provisionally what the ID will be.... but I've seen that it is possible to still retrieve the 'next auto-incremented integer' too. A DBA contractor reported that our queries could be up to 30% faster if we used the Integer type instead of GUIDS... Why does the GUID data type exist, what advantages does it really provide?... Even if its a choice by some professional there must be some good reasons as to why its implemented?

    Read the article

  • Design suggestion for expression tree evaluation with time-series data

    - by Lirik
    I have a (C#) genetic program that uses financial time-series data and it's currently working but I want to re-design the architecture to be more robust. My main goals are: sequentially present the time-series data to the expression trees. allow expression trees to access previous data rows when needed. to optimize performance of the data access while evaluating the expression trees. keep a common interface so various types of data can be used. Here are the possible approaches I've thought about: I can evaluate the expression tree by passing in a data row into the root node and let each child node use the same data row. I can evaluate the expression tree by passing in the data row index and letting each node get the data row from a shared DataSet (currently I'm passing the row index and going to multiple synchronized arrays to get the data). Hybrid: an immutable data set is accessible by all of the expression trees and each expression tree is evaluated by passing in a data row. The benefit of the first approach is that the data row is being passed into the expression tree and there is no further query done on the data set (which should increase performance in a multithreaded environment). The drawback is that the expression tree does not have access to the rest of the data (in case some of the functions need to do calculations using previous data rows). The benefit of the second approach is that the expression trees can access any data up to the latest data row, but unless I specify what that row is, I'll have to iterate through the rows and figure out which one is the last one. The benefit of the hybrid is that it should generally perform better and still provide access to the earlier data. It supports two basic "views" of data: the latest row and the previous rows. Do you guys know of any design patterns or do you have any tips that can help me build this type of system? Should I use a DataSet to hold and present the data, or are there more efficient ways to present rows of data while maintaining a simple interface? FYI: All of my code is written in C#.

    Read the article

< Previous Page | 352 353 354 355 356 357 358 359 360 361 362 363  | Next Page >