Search Results

Search found 13403 results on 537 pages for 'epm performance tuning'.

Page 282/537 | < Previous Page | 278 279 280 281 282 283 284 285 286 287 288 289  | Next Page >

  • Opinions on Unladen Swallow?

    - by vartec
    What are your opinions and expectations on Google's Unladen Swallow? From their project plan: We want to make Python faster, but we also want to make it easy for large, well-established applications to switch to Unladen Swallow. Produce a version of Python at least 5x faster than CPython. Python application performance should be stable. Maintain source-level compatibility with CPython applications. Maintain source-level compatibility with CPython extension modules. We do not want to maintain a Python implementation forever; we view our work as a branch, not a fork. And even sweeter: In addition, we intend to remove the GIL and fix the state of multithreading in Python. We believe this is possible through the implementation of a more sophisticated GC It almost looks too good to be true, like the best of PyPy and Stackless combined. More info: Jesse Noller: "Pycon: Unladen-Swallow" ArsTechnica: "Google searches for holy grail of Python performance" Update: as DNS pointed out, there was related question: http://stackoverflow.com/questions/695370/what-is-llvm-and-how-is-replacing-python-vm-with-llvm-increasing-speeds-5x

    Read the article

  • Database structure - is mySQL the right choice?

    - by Industrial
    Hi everyone, We are currently planning the database structure of a quite complex e-commerce web app that has flexibility as it's main cornerstone. Our app features a large amount of data (products) and we have run into a slight headache trying to keep performance high without compromizing normalization rules in the database, or leaving our highly beloved flexibility concept behind when integrating product options (also widely known as product attributes or parameters). Based on various references and sources available, we have made up lists on pros and cons of all major and well known database patterns to solve this. After comparing these, we have come up with two final alternatives: EAV (Entity-attribute-value model) : Pros: Database is used for all sorting. Cons: All related queries will include a number of joins between multiple tables in order to complete the collection of data. SLOB (Serialized LOB, also known as Facade?) : Pros: Very flexible. Keeping the number of necessary joins low compared to a EAV design pattern. Easy to update/add/remove data from each product. Cons: All sorting will be done by the application instead of the database. Will use lots of performance (memory?) when big datasets is processed by a large number of users. Our main questions: Which pattern/structure would you use, or maybe even a different solution? Is there better databases besides mySQL available nowadays to accomplish what we want? Thanks a lot! Reference: http://stackoverflow.com/questions/695752/product-table-many-kinds-of-product-each-product-has-many-parameters

    Read the article

  • Saturated addition of two signed Java 'long' values

    - by finnw
    How can one add two long values (call them x and y) in Java so that if the result overflows then it is clamped to the range Long.MIN_VALUE..Long.MAX_VALUE? For adding ints one can perform the arithmetic in long precision and cast the result back to an int, e.g.: int saturatedAdd(int x, int y) { long sum = (long) x + (long) y; long clampedSum = Math.max((long) Integer.MIN_VALUE, Math.min(sum, (long) Integer.MAX_VALUE)); return (int) clampedSum; } or import com.google.common.primitives.Ints; int saturatedAdd(int x, int y) { long sum = (long) x + (long) y; return Ints.saturatedCast(sum); } but in the case of long there is no larger primitive type that can hold the intermediate (unclamped) sum. Since this is Java, I cannot use inline assembly (in particular SSE's saturated add instructions.) It can be implemented using BigInteger, e.g. static final BigInteger bigMin = BigInteger.valueOf(Long.MIN_VALUE); static final BigInteger bigMax = BigInteger.valueOf(Long.MAX_VALUE); long saturatedAdd(long x, long y) { BigInteger sum = BigInteger.valueOf(x).add(BigInteger.valueOf(y)); return bigMin.max(sum).min(bigMax).longValue(); } however performance is important so this method is not ideal (though useful for testing.) I don't know whether avoiding branching can significantly affect performance in Java. I assume it can, but I would like to benchmark methods both with and without branching. Related: http://stackoverflow.com/questions/121240/saturating-addition-in-c

    Read the article

  • What is the best Binary Decision Diagram library for Java?

    - by reprogrammer
    A Binary Decision Diagram (BDD) is a data structure to represent boolean functions. I'd like use this data structure in a Java program. My search for Java based BDD libraries resulted into the following packages. Java Decision Diagram Libraries JavaBDD JDD JBDD bddbddb If you know of any other BDD libraries available for Java programs, please let me know so that I add it to the list above. If you have used any of these libraries, please tell me about your experience with the library. In particular, I'd like you to compare the available libraries along the following dimensions. Quality. Is the library mature and reasonably bug free? Performance. How do you evaluate the performance of the library? Support. Could you easily get support whenever you encountered a problem with the library? Was the library well documented? Ease of use. Was the API well designed? Could you install and use the library quickly and easily? Please mention the version of the library that you are evaluating.

    Read the article

  • Best Practice, objects design ASP.NET MVC

    - by DoomStone
    Hello Stackoverflow I have a code design question that have been torbeling me for a while, you see I’m doing a refactoring of my website Cosplay Denmark, a site where cospalyers can upload images of them self in their costumes. The original site was done in php, Zend MVC, but my refactoring is being done in ASP.NET MVC 2. If you take the site http://www.cosplaydanmark.dk/Costumes/ (You can switch to English in the left column (Sprog)) Here you see a list of all the anime’s we have on the site with images, we show the name, how many different characters and how many images there are under this anime. http://www.cosplaydanmark.dk/Costumes/Bleach If you click on an anime will you get a list of characters within the given anime which we have images in, here do we show the character name, how many galleries and how many images. http://www.cosplaydanmark.dk/Costumes/Bleach/Ichigo_Kurosaki/ If you click on the character name, will you get a list of the galleries under the given character in the given anime. Here we have some information about the gallery, such as image count. http://www.cosplaydanmark.dk/Costumes/Bleach/Ichigo_Kurosaki/Admi/ Should you click the gallery do you get a list of the images in the gallery. My database look like this at the moment. As you can might imagine there are a lot of different query’s to create the site, on the first site I need to do a select on the on the “animes” table and for each result, I need to do a count select on characters and galleries. My plan to create this will be one of the following Where the IList, would be a lazy load list. But I can’t decide what would be the best solution for this would be, also if there is a better way of doing this. My priority is to have good performance with a minimum lose of features and code upkeep. I’m using a service pattern with a linq to sql repository. My design is not absolute, I’m willing to change it if it could increase performance :D I hope that I have describe my question good enough for you to understand what I mean, but ask away if there are anything I have missed.

    Read the article

  • Software development metrics and reporting

    - by David M
    I've had some interesting conversations recently about software development metrics, in particular how they can be used in a reasonably large organisation to help development teams work better. I know there have been Stack Overflow questions about which metrics are good to use - like this one, but my question is more about which metrics are useful to which stakeholders, and at what level of aggregation. As an example, my view is that code coverage is a useful metric in the following ways (and maybe others): For a team's own internal use when combined with other measurements. For facilitating/enabling/mentoring teams, where it might be instructive when considered on a team-by-team basis as a trend (e.g. if team A and B have coverage this month of 75 and 50, I'd be more concerned with team A than B if the previous month they'd had 80 and 40). For senior management when presented as an aggregated statistic across a number of teams or a whole department. But I don't think it's useful for senior management to see this on a team-by-team basis, as this encourages artifical attempts to bolster coverage with tests that merely exercise, rather than test, code. I'm in an organisation with a couple of levels in its management hierarchy, but where the vast majority of managers are technically minded and able (with many still getting their hands dirty). Some of the development teams are leading the way in driving towards agile development practices, but others lag, and there is now a serious mandate from the top for this to be the way the organisation works. A couple of us are starting a programme to encourage this. In this sort of an organisation, what sort of metrics do you think are useful, to whom, why, and at what level of aggregation? I don't want people to feel their performance is being assessed based on a metric that they can artificially influence; at the same time, the senior management are going to want some sort of evidence that progress is being made. What advice or caveats can you provide based on experience in your own organisations? EDIT We are definitely wanting to use metrics as a tool for organisational improvement not as a tool for individual performance measurement.

    Read the article

  • Scalable / Parallel Large Graph Analysis Library?

    - by Joel Hoff
    I am looking for good recommendations for scalable and/or parallel large graph analysis libraries in various languages. The problems I am working on involve significant computational analysis of graphs/networks with 1-100 million nodes and 10 million to 1+ billion edges. The largest SMP computer I am using has 256 GB memory, but I also have access to an HPC cluster with 1000 cores, 2 TB aggregate memory, and MPI for communication. I am primarily looking for scalable, high-performance graph libraries that could be used in either single or multi-threaded scenarios, but parallel analysis libraries based on MPI or a similar protocol for communication and/or distributed memory are also of interest for high-end problems. Target programming languages include C++, C, Java, and Python. My research to-date has come up with the following possible solutions for these languages: C++ -- The most viable solutions appear to be the Boost Graph Library and Parallel Boost Graph Library. I have looked briefly at MTGL, but it is currently slanted more toward massively multithreaded hardware architectures like the Cray XMT. C - igraph and SNAP (Small-world Network Analysis and Partitioning); latter uses OpenMP for parallelism on SMP systems. Java - I have found no parallel libraries here yet, but JGraphT and perhaps JUNG are leading contenders in the non-parallel space. Python - igraph and NetworkX look like the most solid options, though neither is parallel. There used to be Python bindings for BGL, but these are now unsupported; last release in 2005 looks stale now. Other topics here on SO that I've looked at have discussed graph libraries in C++, Java, Python, and other languages. However, none of these topics focused significantly on scalability. Does anyone have recommendations they can offer based on experience with any of the above or other library packages when applied to large graph analysis problems? Performance, scalability, and code stability/maturity are my primary concerns. Most of the specialized algorithms will be developed by my team with the exception of any graph-oriented parallel communication or distributed memory frameworks (where the graph state is distributed across a cluster).

    Read the article

  • Changing App.config at Runtime

    - by born to hula
    I'm writing a test winforms / C# / .NET 3.5 application for the system we're developing and we fell in the need to switch between .config files at runtime, but this is turning out to be a nightmare. Here's the scene: the Winforms application is aimed at testing a WebApp, divided into 5 subsystems. The test proccess works with messages being sent between the subsystems, and for this proccess to be sucessful each subsystem got to have its own .config file. For my Test Application I wrote 5 separate configuration files. I wish I was able to switch between these 5 files during runtime, but the problem is: I can programatically edit the application .config file inumerous times, but these changes will only take effect once. I've been searching a long time for a form to address this problem but I still wasn't sucessful. I know the problem definition may be a bit confusing but I would really appreciate it if someone helped me. Thanks in advance! --- UPDATE 01-06-10 --- There's something I didn't mention before. Originally, our system is a Web Application with WCF calls between each subsystem. For performance testing reasons (we're using ANTS 4), we had to create a local copy of the assemblies and reference them from the test project. It may sound a bit wrong, but we couldn't find a satisfying way to measure performance of a remote application. --- End Update --- Here's what I'm doing: public void UpdateAppSettings(string key, string value) { XmlDocument xmlDoc = new XmlDocument(); xmlDoc.Load(AppDomain.CurrentDomain.SetupInformation.ConfigurationFile); foreach (XmlElement item in xmlDoc.DocumentElement) { foreach (XmlNode node in item.ChildNodes) { if (node.Name == key) { node.Attributes[0].Value = value; break; } } } xmlDoc.Save(AppDomain.CurrentDomain.SetupInformation.ConfigurationFile); System.Configuration.ConfigurationManager.RefreshSection("section/subSection"); }

    Read the article

  • Efficient Context-Free Grammar parser, preferably Python-friendly

    - by Max Shawabkeh
    I am in need of parsing a small subset of English for one of my project, described as a context-free grammar with (1-level) feature structures (example) and I need to do it efficiently . Right now I'm using NLTK's parser which produces the right output but is very slow. For my grammar of ~450 fairly ambiguous non-lexicon rules and half a million lexical entries, parsing simple sentences can take anywhere from 2 to 30 seconds, depending it seems on the number of resulting trees. Lexical entries have little to no effect on performance. Another problem is that loading the (25MB) grammar+lexicon at the beginning can take up to a minute. From what I can find in literature, the running time of the algorithm used to parse such a grammar (Earley or CKY) should be linear to the size of the grammar and cubic to the size of the input token list. My experience with NLTK indicates that ambiguity is what hurts the performance most, not the absolute size of the grammar. So now I'm looking for a CFG parser to replace NLTK. I've been considering PLY but I can't tell whether it supports feature structures in CFGs, which are required in my case, and the examples I've seen seem to be doing a lot of procedural parsing rather than just specifying a grammar. Can anybody show me an example of PLY both supporting feature structs and using a declarative grammar? I'm also fine with any other parser that can do what I need efficiently. A Python interface is preferable but not absolutely necessary.

    Read the article

  • What are alternatives to Win32 PulseEvent() function?

    - by Bill
    The documentation for the Win32 API PulseEvent() function (kernel32.dll) states that this function is “… unreliable and should not be used by new applications. Instead, use condition variables”. However, condition variables cannot be used across process boundaries like (named) events can. I have a scenario that is cross-process, cross-runtime (native and managed code) in which a single producer occasionally has something interesting to make known to zero or more consumers. Right now, a well-known named event is used (and set to signaled state) by the producer using this PulseEvent function when it needs to make something known. Zero or more consumers wait on that event (WaitForSingleObject()) and perform an action in response. There is no need for two-way communication in my scenario, and the producer does not need to know if the event has any listeners, nor does it need to know if the event was successfully acted upon. On the other hand, I do not want any consumers to ever miss any events. In other words, the system needs to be perfectly reliable – but the producer does not need to know if that is the case or not. The scenario can be thought of as a “clock ticker” – i.e., the producer provides a semi-regular signal for zero or more consumers to count. And all consumers must have the correct count over any given period of time. No polling by consumers is allowed (performance reasons). The ticker is just a few milliseconds (20 or so, but not perfectly regular). Raymen Chen (The Old New Thing) has a blog post pointing out the “fundamentally flawed” nature of the PulseEvent() function, but I do not see an alternative for my scenario from Chen or the posted comments. Can anyone please suggest one? Please keep in mind that the IPC signal must cross process boundries on the machine, not simply threads. And the solution needs to have high performance in that consumers must be able to act within 10ms of each event.

    Read the article

  • Better why of looping to detect change.

    - by Dremation
    As of now I'm using a while(true) method to detect changes in memory. The problem with this is it's kill the applications performance. I have a list of 30 pointers that need checked as rapidly as possible for changes, without sacrificing a huge performance loss. Anyone have ideas on this? memScan = new Thread(ScanMem); public static void ScanMem() { int i = addy.Length; while (true) { Thread.Sleep(30000); //I do this to cut down on cpu usage for (int j = 0; j < i; j++) { string[] values = addy[j].Split(new char[] { Convert.ToChar(",") }); //MessageBox.Show(values[2]); try { if (Memory.Scanner.getIntFromMem(hwnd, (IntPtr)Convert.ToInt32(values[0], 16), 32).ToString() != values[1].ToString()) { //Ok, it changed lets do our work //work if (Globals.Working) return; SomeFunction("Results: " + values[2].ToString(), "Memory"); Globals.Working = true; }//end if }//end try catch { } }//end for }//end while }//end void

    Read the article

  • Product Catalog Schema design

    - by FlySwat
    I'm building a proof of concept schema for a product catalog to possibly replace a very aging and crufty one we use. In our business, we sell both physical materials and services (one time and reoccurring charges). The current catalog schema has each distinct category broken out into individual tables, while this is nicely normalized and performs well, it is fairly difficult to extend. Adding a new attribute to a particular product involves changing the table schema and backpopulating old data. An idea I've been toying with has been something along the line of a base set of entity tables in 3rd normal form, these will contain the facts that are common among ALL products. Then, I'd like to build an Attribute-Entity-Value schema that allows each entity type to be extended in a flexible way using just data and no schema changes. Finally, I'd like to denormalize this data model into materialized views for each individual entity type. This views are what the application would access. We also have many tables that contain business rules and compatibility rules. These would join against the base entity tables instead of the views. My big concerns here are: Performance - Attribute-Entity-Value schemas are flexible, but typically perform poorly, should I be concerned? More Performance - Denormalizing using materialized views may have some risks, I'm not positive on this yet. Complexity - While this schema is flexible and maintainable using just data, I worry that the complexity of the design might make future schema changes difficult. For those who have designed product catalogs for large scale enterprises, am I going down the totally wrong path? Is there any good best practice schema design reading available for product catalogs?

    Read the article

  • C++ .NET DLL vs C# Managed Code ? (File Encrypting AES-128+XTS)

    - by Ranhiru
    I need to create a Windows Mobile Application (WinMo 6.x - C#) which is used to encrypt/decrypt files. However it is my duty to write the encryption algorithm which is AES-128 along with XTS as the mode of operation. RijndaelManaged just doesn't cut it :( Very much slower than DES and 3DES CryptoServiceProviders :O I know it all depends on how good I am at writing the algorithm in the most efficient way. (And yes I my self have to write it from scratch but i can take a look @ other implementations) Nevertheless, does writing a C++ .NET DLL to create the encryption/decryption algorithm + all the file handling and using it from C# have a significant performance advantage OVER writing the encryption algorithm + file handling in completely managed C# code? If I use C++ .NET to create the encryption algorithm, should I use MFC Smart Device DLL or ATL? What is the difference and is there any impact on which one I choose? And can i just add a reference to the C++ DLL from C# or should I use P/Invoke? I am fairly competent with C# than C++ but performance plays a major role as I have convinced my lecturers that AES is a very efficient cryptographic algorithm for resource constrained devices. Thanx a bunch :)

    Read the article

  • asynchronous pages

    - by lockedscope
    I have just read the multi-threading and custom threading in asp.net articles. http://www.williablog.net/williablog/post/2008/12/16/Custom-Threading-in-ASPNET.aspx http://www.williablog.net/williablog/post/2008/12/16/Multi-Threading-in-ASPNET.aspx I have couple of questions. What does he mean by returning a thread to the pool? Is that thread completely removed from memory or put in to a state that it does not scheduled to CPU(is it in sleep state or whatever)? If that thread is removed from memory how could it survive after async point? How this mechanism works? Are every objects(pages class, request,response etc.) are copied to somewhere else before they are disposed? (Or, is it just waiting in a sleep state and then its waked when async call ends?) He is saying that; "Having said that, making pages asynchronous is not really about improving performance, it is about improving scalability" then he is saying; "I'm sorry to say that it will do nothing for scalability or performance." So which one is true? or for which case(s) are they true?

    Read the article

  • Google Code Jam 2010 Large DataSets Take Too Long to Submit

    - by Travis
    Hey Guys, I'm participating in the 2010 code jam and I solved two of the problems for the small data sets, but I'm not even close to solving the large data sets in the 8 minute time frame. I'm wondering if anyone out there has solved the large data set: What hardware were you running on? What language were you running on? What performance tuning techniques did you do on your code to run as fast as possible? I'm writing the solutions in Ruby, which is not my day to day language, and executing them on my Macbook Pro. My solutions for problem A and problem C are on github at http://github.com/tjboudreaux/codejam2010. I'd appreciate any suggestions that you may have. FWIW, I have alot of experience in C++ from college, my primary language is PHP, and my "sandbox" language is Ruby. Was I just a bit ambitious by taking a shot at this in Ruby, not knowing where the language struggles for performance, or does anyone see anything that's a redflag as to why I can't complete the large dataset in time to submit.

    Read the article

  • Cassandra random read speed

    - by Jody Powlette
    We're still evaluating Cassandra for our data store. As a very simple test, I inserted a value for 4 columns into the Keyspace1/Standard1 column family on my local machine amounting to about 100 bytes of data. Then I read it back as fast as I could by row key. I can read it back at 160,000/second. Great. Then I put in a million similar records all with keys in the form of X.Y where X in (1..10) and Y in (1..100,000) and I queried for a random record. Performance fell to 26,000 queries per second. This is still well above the number of queries we need to support (about 1,500/sec) Finally I put ten million records in from 1.1 up through 10.1000000 and randomly queried for one of the 10 million records. Performance is abysmal at 60 queries per second and my disk is thrashing around like crazy. I also verified that if I ask for a subset of the data, say the 1,000 records between 3,000,000 and 3,001,000, it returns slowly at first and then as they cache, it speeds right up to 20,000 queries per second and my disk stops going crazy. I've read all over that people are storing billions of records in Cassandra and fetching them at 5-6k per second, but I can't get anywhere near that with only 10mil records. Any idea what I'm doing wrong? Is there some setting I need to change from the defaults? I'm on an overclocked Core i7 box with 6gigs of ram so I don't think it's the machine. Here's my code to fetch records which I'm spawning into 8 threads to ask for one value from one column via row key: ColumnPath cp = new ColumnPath(); cp.Column_family = "Standard1"; cp.Column = utf8Encoding.GetBytes("site"); string key = (1+sRand.Next(9)) + "." + (1+sRand.Next(1000000)); ColumnOrSuperColumn logline = client.get("Keyspace1", key, cp, ConsistencyLevel.ONE); Thanks for any insights

    Read the article

  • PostgreSQL: BYTEA vs OID+Large Object?

    - by mlaverd
    I started an application with Hibernate 3.2 and PostgreSQL 8.4. I have some byte[] fields that were mapped as @Basic (= PG bytea) and others that got mapped as @Lob (=PG Large Object). Why the inconsistency? Because I was a Hibernate noob. Now, those fields are max 4 Kb (but average is 2-3 kb). The PostgreSQL documentation mentioned that the LOs are good when the fields are big, but I didn't see what 'big' meant. I have upgraded to PostgreSQL 9.0 with Hibernate 3.6 and I was stuck to change the annotation to @Type(type="org.hibernate.type.PrimitiveByteArrayBlobType"). This bug has brought forward a potential compatibility issue, and I eventually found out that Large Objects are a pain to deal with, compared to a normal field. So I am thinking of changing all of it to bytea. But I am concerned that bytea fields are encoded in Hex, so there is some overhead in encoding and decoding, and this would hurt the performance. Are there good benchmarks about the performance of both of these? Anybody has made the switch and saw a difference?

    Read the article

  • Does anyone know of a good Commercial WPF Web Browser Control?

    - by VoidDweller
    I have an MDI WPF app that I need to add web content too. At first, great it looks like I have 2 options built into the framework the Frame control and the WebBrowser control. Given that this is an MDI app it doesn't take long to discover that neither of these will work. The WebBrowser control wraps up the IE WebBrowser ActiveX Control which uses the Win32 graphics pipeline. The "Airspace" issue pretty much sums this up as "Sorry, the layouts will not play nice together". Yes, I have thought about taking snapshots of the web content rendering these and mapping the mouse and keyboard events back to the browser control, but I can't afford the performance penalty and I really don't have time to write and thoroughly test it. I have looked for third party controls, but so far I have only found Chris Cavanagh's WPF Chromium Web Browser control. Which wraps up Awesomium 1.5. Together these are very cool, they play nice with the WPF layouts. But they do not meet my performance requirements. They are VERY HEAVY on memory consumption and not to friendly with CPU usage either. Not to mention still quite buggy. I'll elaborate if you are interested. So, do any of you know of a stable performant WPF web browser control? Thanks.

    Read the article

  • MongoDB architectural question

    - by pex
    I have to store 4 Models. Let's say a Post that has many and belongs to many Categories. Category on the other hand has many Qualities. At the moment I'm of the opinion, that Post and Categories are Documents. Qualities becomes an EmbeddedDocument of Categories. We're coming to the root problem: There are a lot of Votes on Qualities that belong to a Post. I thought about embed Votes in Post and give it a quality_id. I am really expecting a lot of Votes and there has to be a possibility to filter them (e.g by Username / Usergroup / Date voted). I worked with MongoMapper and I think the missing existence of find methods for EmbeddedDocuments could become a killer. On the other hand I'm wondering about performance issues. What if I want to provide a Post without all the Votes, but only a few. Or, what if I define an own Document for Votes and have tons of Vote-Documents? Wouldn't that become a performance killer?

    Read the article

  • "Replay" the steps needed to recreate an error

    - by David
    I am going to create a typical business application that will be used by a few hundred consultants. Normally, the consultants would be presented with an error message with a standard text. As the application will be a complicated one with lots of changes being made to it constantly I would like the following: When an error message is presented, the user has the option to "send" the error message to the developers. The developers should be able to open the incoming file in i.e. Eclipse and debug the steps of the last 10 minutes of work step by step (one line at a time if they want to). Everything should be transparent, meaning that they for example should be able to see the return values of calls to the database. Are there any solutions that offer such functionality today, my preferred language is Python or also Java. I know that there will be a huge performance hit because of such functionality, but that is acceptable as this kind of software is not performance sensitive. It would be VERY nice if the database also had a cronology so that one could query the database for values that existed at the exact time that a specific line of code was run in the application, leading up to the bug.

    Read the article

  • What's the fastest way to bulk insert a lot of data in SQL Server (C# client)

    - by Andrew
    I am hitting some performance bottlenecks with my C# client inserting bulk data into a SQL Server 2005 database and I'm looking for ways in which to speed up the process. I am already using the SqlClient.SqlBulkCopy (which is based on TDS) to speed up the data transfer across the wire which helped a lot, but I'm still looking for more. I have a simple table that looks like this: CREATE TABLE [BulkData]( [ContainerId] [int] NOT NULL, [BinId] [smallint] NOT NULL, [Sequence] [smallint] NOT NULL, [ItemId] [int] NOT NULL, [Left] [smallint] NOT NULL, [Top] [smallint] NOT NULL, [Right] [smallint] NOT NULL, [Bottom] [smallint] NOT NULL, CONSTRAINT [PKBulkData] PRIMARY KEY CLUSTERED ( [ContainerIdId] ASC, [BinId] ASC, [Sequence] ASC )) I'm inserting data in chunks that average about 300 rows where ContainerId and BinId are constant in each chunk and the Sequence value is 0-n and the values are pre-sorted based on the primary key. The %Disk time performance counter spends a lot of time at 100% so it is clear that disk IO is the main issue but the speeds I'm getting are several orders of magnitude below a raw file copy. Does it help any if I: Drop the Primary key while I am doing the inserting and recreate it later Do inserts into a temporary table with the same schema and periodically transfer them into the main table to keep the size of the table where insertions are happening small Anything else? -- Based on the responses I have gotten, let me clarify a little bit: Portman: I'm using a clustered index because when the data is all imported I will need to access data sequentially in that order. I don't particularly need the index to be there while importing the data. Is there any advantage to having a nonclustered PK index while doing the inserts as opposed to dropping the constraint entirely for import? Chopeen: The data is being generated remotely on many other machines (my SQL server can only handle about 10 currently, but I would love to be able to add more). It's not practical to run the entire process on the local machine because it would then have to process 50 times as much input data to generate the output. Jason: I am not doing any concurrent queries against the table during the import process, I will try dropping the primary key and see if that helps. ~ Andrew

    Read the article

  • Blocking on DBCP connection pool (open and close connnection). Is database connection pooling in OpenEJB pluggable?

    - by topchef
    We use OpenEJB on Tomcat (used to run on JBoss, Weblogic, etc.). While running load tests we experience significant performance problems with handling JMS messages (queues). Problem was localized to blocking on database connection pool getting or releasing connection to the pool. Blocking prevented concurrent MDB instances (threads) from running hence performance suffered 10-fold and worse. The same code used to run on application servers (with their respective connection pool implementations) with no blocking at all. Example of thread blocked: Name: JMS Resource Adapter-worker-23 State: BLOCKED on org.apache.commons.pool.impl.GenericObjectPool@1ea6b4a owned by: JMS Resource Adapter-worker-19 Total blocked: 18,426 Total waited: 0 Stack trace: org.apache.commons.pool.impl.GenericObjectPool.returnObject(GenericObjectPool.java:916) org.apache.commons.dbcp.PoolableConnection.close(PoolableConnection.java:91) - locked org.apache.commons.dbcp.PoolableConnection@1bcba8 org.apache.commons.dbcp.managed.ManagedConnection.close(ManagedConnection.java:147) com.xxxxx.persistence.DbHelper.closeConnection(DbHelper.java:290) .... Couple of questions. I am almost certain that some transactional attributes and properties contribute to this blocking, but MDBs are defined as non-transactional (we use both annotations and ejb-jar.xml). Some EJBs do use container-managed transactions though (and we can observe blocking there as well). Are there any DBCP configurations that may fix blocking? Is DBCP connection pool implementation replaceable in OpenEJB? How easy (difficult) to replace it with another library? Just in case this is how we define data source in OpenEJB (openejb.xml): <Resource id="MyDataSource" type="DataSource"> JdbcDriver oracle.jdbc.driver.OracleDriver JdbcUrl ${oracle.jdbc} UserName ${oracle.user} Password ${oracle.password} JtaManaged true InitialSize 5 MaxActive 30 ValidationQuery SELECT 1 FROM DUAL TestOnBorrow true </Resource>

    Read the article

  • Help with Neuroph neural network

    - by user359708
    For my graduate research I am creating a neural network that trains to recognize images. I am going much more complex than just taking a grid of RGB values, downsampling, and and sending them to the input of the network, like many examples do. I actually use over 100 independently trained neural networks that detect features, such as lines, shading patterns, etc. Much more like the human eye, and it works really well so far! The problem is I have quite a bit of training data. I show it over 100 examples of what a car looks like. Then 100 examples of what a person looks like. Then over 100 of what a dog looks like, etc. This is quite a bit of training data! Currently I am running at about one week to train the network. This is kind of killing my progress, as I need to adjust and retrain. I am using Neuroph, as the low-level neural network API. I am running a dual-quadcore machine(16 cores with hyperthreading), so this should be fast. My processor percent is at only 5%. Are there any tricks on Neuroph performance? Or Java peroformance in general? Suggestions? I am a cognitive psych doctoral student, and I am decent as a programmer, but do not know a great deal about performance programming.

    Read the article

  • Best architecture for a social media app

    - by Sky
    Hey guys, Im working on promising project that develops a new social media app for web and mobile. We are at begin defining functionalities. Nevertheless, I'm thinking ahead on architecture. So I'm asking: 1 - Whats the best plataform to develop the core of this aplication that will have a Rest API interface. 2 - Whats the best database that will scale and grow with my application. As far as I researched, these were the answers I found most interesting: For database: Cassandra NoSQL DB, amazing scalabilty, amazing write performance, good read performance (will be improved on 0.6). I think i will choose that one. Zookeer for transactions on Cassandra. I think that 2 technologies rly good for that propose. What do you think guys? On the front end that will serve the REST API, i dont have a final candidate. For this one i have questions based on Perfomance X Scalabilty X Fast Development/Maintenance. Java or .Net As far as I researched, brings the best balance of this requisits. Python, pearl and Rail, has the best (Fast Development/Maintenance), but sux on all other. C or C++ I dont even consider, because its (Fast Development/Maintenance) sux... So what do you guy think about it?

    Read the article

  • Design suggestion for expression tree evaluation with time-series data

    - by Lirik
    I have a (C#) genetic program that uses financial time-series data and it's currently working but I want to re-design the architecture to be more robust. My main goals are: sequentially present the time-series data to the expression trees. allow expression trees to access previous data rows when needed. to optimize performance of the data access while evaluating the expression trees. keep a common interface so various types of data can be used. Here are the possible approaches I've thought about: I can evaluate the expression tree by passing in a data row into the root node and let each child node use the same data row. I can evaluate the expression tree by passing in the data row index and letting each node get the data row from a shared DataSet (currently I'm passing the row index and going to multiple synchronized arrays to get the data). Hybrid: an immutable data set is accessible by all of the expression trees and each expression tree is evaluated by passing in a data row. The benefit of the first approach is that the data row is being passed into the expression tree and there is no further query done on the data set (which should increase performance in a multithreaded environment). The drawback is that the expression tree does not have access to the rest of the data (in case some of the functions need to do calculations using previous data rows). The benefit of the second approach is that the expression trees can access any data up to the latest data row, but unless I specify what that row is, I'll have to iterate through the rows and figure out which one is the last one. The benefit of the hybrid is that it should generally perform better and still provide access to the earlier data. It supports two basic "views" of data: the latest row and the previous rows. Do you guys know of any design patterns or do you have any tips that can help me build this type of system? Should I use a DataSet to hold and present the data, or are there more efficient ways to present rows of data while maintaining a simple interface? FYI: All of my code is written in C#.

    Read the article

< Previous Page | 278 279 280 281 282 283 284 285 286 287 288 289  | Next Page >