Search Results

Search found 13928 results on 558 pages for 'large scale nat'.

Page 52/558 | < Previous Page | 48 49 50 51 52 53 54 55 56 57 58 59 | Next Page >

What is the best way to work with large databases in Java depending on context?

- by Singletony

Hi guys. We are trying to figure out the best practice for working with very large DBs in Java. What we do is a kind of BI, i.e analyzing very large DBs, and using them to create intermediate DBs that represent intelligent knowledge of the DBs. We are currently using JDBC, and just preforming queries using a ResultSet. As more and more data is being created, we are wondering whether more appropriate ways exist for parsing and manipulating these large DBs: We need to support 'chunk' manipulation and not an entire DB at once(e.g. limit in JDBC, very poor performance) We do not need to be constantly connected since we are just pulling results and creating new tables of our own. We want to understand JDBC alternatives, with respect to advantages and disadvantages. Whether you think JDBC is the way to go or not, what are the best practices to go by depending on context (e.g. for large DBs queried in chunks) ? If my question is not clear, I will gladly elaborate! THANK YOU SO MUCH!

Read the article
Method for finding memory leak in large Java heap dumps

- by Rickard von Essen

I have to find a memory leak in a Java application. I have some experience with this but would like advice on a methodology/strategy for this. Any reference and advice is welcome. About our situation: Heap dumps are larger than 1 GB We have heap dumps from 5 occasions. We don't have any test case to provoke this. It only happens in the (massive) system test environment after at least a weeks usage. The system is built on a internally developed legacy framework with so many design flaws that they are impossible to count them all. Nobody understands the framework in depth. It has been transfered to one guy in India who barely keeps up with answering e-mails. We have done snapshot heap dumps over time and concluded that there is not a single component increasing over time. It is everything that grows slowly. The above points us in the direction that it is the frameworks homegrown ORM system that increases its usage without limits. (This system maps objects to files?! So not really a ORM) Question: What is the methodology that helped you succeed with hunting down leaks in a enterprise scale application?

Read the article
Starting out NLP - Python + large data set

- by pencilNero

Hi, I've been wanting to learn python and do some NLP, so have finally gotten round to starting. Downloaded the english wikipedia mirror for a nice chunky dataset to start on, and have been playing around a bit, at this stage just getting some of it into a sqlite db (havent worked with dbs in the past unfort). But I'm guessing sqlite is not the way to go for a full blown nlp project(/experiment :) - what would be the sort of things I should look at ? HBase (.. and hadoop) seem interesting, i guess i could run then im java, prototype in python and maybe migrate the really slow bits to java... alternatively just run Mysql.. but the dataset is 12gb, i wonder if that will be a problem? Also looked at lucene, but not sure how (other than breaking the wiki articles into chunks) i'd get that to work.. What comes to mind for a really flexible NLP platform (i dont really know at this stage WHAT i want to do.. just want to learn large scale lang analysis tbh) ? Many thanks.

Read the article
Getting the Item Count of a large sharepoint list in fastest way

- by sooraj

I am trying to get the count of the items in a sharepoint document library programatically. The scale I am working with is 30-70000 items. We have usercontrol in a smartpart to display the count . Ours is a TEAM site. This is the code to get the total count SPList VoulnterrList = web.Lists[ListTitle]; SPQuery query = new SPQuery(); query.ViewAttributes = "Scope=\"Recursive\""; string queries = "<Where><Eq><FieldRef Name='ApprovalStatus' /><Value Type='Choice'>Pending</Value></Eq></Where>"; query.Query = queries; SPListItemCollection lstitemcollAssoID = VoulnterrList.GetItems(query); lblCount.Text = "Total Proofs: " + VoulnterrList.Items.Count.ToString() + " Pending Proofs: " + lstitemcollAssoID.Count.ToString(); The problem is this has serious performance issue it takes 75 to 80 sec to load the page. if we comment this page load will decrees to 4 sec. Any better approch for this problem Ours is sharepoint 2007

Read the article
Fastest way to write large STL vector to file using STL

- by ljubak

I have a large vector (10^9 elements) of chars, and I was wondering what is the fastest way to write such vector to a file. So far I've been using next code: vector<char> vs; // ... Fill vector with data ofstream outfile("nanocube.txt", ios::out | ios::binary); ostream_iterator<char> oi(outfile, '\0'); copy(vs.begin(), vs.end(), oi); For this code it takes approximately two minutes to write all data to file. The actual question is: "Can I make it faster using STL and how"?

Read the article
nServiceBus with large XML messages

- by Sean

Hello, I have read about the true messaging and that instead of sending payload on the bus, it sends an identifier. In our case, we have a lot of legacy apps/services and those were designed to receive the payload of messages (xml) that is close to 4MB (close MSMQ limit). Is there a way for nService bus to handle large payload and persist messages automatically or another work-around, so that the publisher/subscriber services don't have to worry neither about the payload size, nor about how to de/re-hydrate the payload? Thank you in advance.

Read the article
How to geocode a large number of addresses?

- by user308569

I need to geocode, i.e. translate street address to latitude,longitude for ~8,000 street addresses. I am using both Yahoo and Google geocoding engines at http://www.gpsvisualizer.com/geocoder/, and found out that for a large number of addresses those engines (one of them or both) either could not perform geocoding (i.e.return latitude=0,longitude=0), or return the wrong coordinates (incl. cases when Yahoo and Google give different results). What is the best way to handle this problem? Which engine is (usually) more accurate? I would appreciate any thoughts, suggestions, ideas from people who had previous experience with this kind of task.

Read the article
CUDA: accumulate data into a large histogram of floats

- by shoosh

I'm trying to think of a way to implement the following algorithm using CUDA: Working on a large volume of voxels, for each voxel I calculate an index i and a value c. after the calculation I need to perform histogram[i] += c c is a float value and the histogram can have up to 15,000 bins. I'm looking for a way to implement this efficiently using CUDA. The first obvious problem is that with compute capabilities 1.3 which is what I'm using I can't even do an atomicAdd() of floats so how can I accumulate anything reliably? This example by nVidia does something somewhat simpler. The histograms are saved in the shared memory (which I can't do due to its size) and it only accumulates integers. Can this approach be generalized to my case?

Read the article
Large svn external

- by MPelletier

I have a project which uses a large library residing in its own repository. Using: Tortoise-SVN, the server is running an enterprise edition of VisualSVN The project itself has the "standard" structure: trunk tags branches In each branch, tag, and trunk is the library, set as an external (svn:external property). If I get the entire tree, I get the library several times, which is just getting too ridiculously repetitive. Is there a recommended structure for this? Or perhaps a way not to get all externals (because other externals are much smaller, easier to manipulate)?

Read the article
How do I export a large table into 50 smaller csv files of 100,000 records each

- by Eddie

I am trying to export one field from a very large table - containing 5,000,000 records, for example - into a csv list - but not all together, rather, 100,000 records into each .csv file created - without duplication. How can I do this, please? I tried SELECT field_name FROM table_name WHERE certain_conditions_are_met INTO OUTFILE /tmp/name_of_export_file_for_first_100000_records.csv LINES TERMINATED BY '\n' LIMIT 0 , 100000 that gives the first 100000 records, but nothing I do has the other 4,900,000 records exported into 49 other files - and how do I specify the other 49 filenames? for example, I tried the following, but the SQL syntax is wrong: SELECT field_name FROM table_name WHERE certain_conditions_are_met INTO OUTFILE /home/user/Eddie/name_of_export_file_for_first_100000_records.csv LINES TERMINATED BY '\n' LIMIT 0 , 100000 INTO OUTFILE /home/user/Eddie/name_of_export_file_for_second_100000_records.csv LINES TERMINATED BY '\n' LIMIT 100001 , 200000 and that did not create the second file... what am I doing wrong, please, and is there a better way to do this? Should the LIMIT 0 , 100000 be put Before the first INTO OUTFILE statement, and then repeat the entire command from SELECT for the second 100,000 records, etc? Thanks for any help. Eddie

Read the article
Large XML files in dataset (outofmemory)

- by dklein

Hi folks, I am currently trying to load a slightly large xml file into a dataset. The xml file is about 700 MB and every time I try to read the xml it needs plenty of time and after a while it throws an "out of memory" exception. DataSet ds = new DataSet(); ds.ReadXml(pathtofile); The main problem is, that it is necessary for me to use those datasets (I use it to import the data from xml file into a sybase database (foreach table, foreach row, foreach column)) and that I have no scheme file. I already googled a while, but I did only find solutions that won't be usable for me.

Read the article
*Client* scalability for large numbers of remote web service calls

- by Yuriy

Hey Guys, I was wondering if you could share best practices and common mistakes when it comes to making large numbers of time-sensitive web service calls. In my case, I have a SOAP and an XML-RPC based web service to which I'm constantly making calls. I predict that this will soon become an issue as the number of calls per second will grow. On a higher level, I was thinking of batching those calls and submitting those to the web services every 100 ms. Could you share what else works? On a lower level side of the things, I use Apache Xml-Rpc client and standard javax.xml.soap.* packages for my client implementations. Are you aware of any client scalability related tricks/tips/warnings with these packages? Thanks in advance Yuriy

Read the article
System.Overflow Exception - int32 is too large or small

- by LonnieBest

I need a little advice. I've got windows service that runs at night. In my development environment, it runs without exception, but when I running it "installed on other machines", when I come in the morning, I'm welcomed with a System.Overflow exception that says that I've set an int32 to value that is too large or small. I've carefully combed the service's c# code, and I have try/catch statements around everything, that should catch any error and write it to a log without completely stopping my service with this overflow exception. But still, it occurs and stops the service. I'd appreciate any conceptual advice on how to pin point what's causing an error such as this.

Read the article
"code too large" compilation error in java

- by trinity

Hello all, Is there any maximum size for code in java.. i wrote a function with more than 10,000 lines. Actually , each line assigns a value to an array variable.. arts_bag[10792]="newyorkartworld"; arts_bag[10793]="leningradschool"; arts_bag[10794]="mailart"; arts_bag[10795]="artspan"; arts_bag[10796]="watercolor"; arts_bag[10797]="sculptures"; arts_bag[10798]="stonesculpture"; And while compiling , i get this error : code too large How do i overcome this ?

Read the article
finding a string of random characters (with possible errors) within a large string of random charact

- by mike

I am trying to search a large string w/o spaces for a smaller string of characters. using regex I can easily find perfect matches but I can't figure out how to find partial matches. by partial matches i mean one or two extra characters in the string or one or two characters that have been changed, or one of each. the first and last characters will always match though. this would be similar to a spell checker but there are no spaces and the strings dont contain actual words, just random hex digits. i figured a way to find the string if there are no extra characters using indexOf(string.charAt(0)) and indexOf(charAt(string.length()-1) and looping through the characters between the two indexes. but this can be problematic when dealing with randomized characters because of the possibility of finding the first and last characters at the correct spacing but none of the middle characters matching. i've been scratching my head for hours on this issue. any ideas?

Read the article
How to organize a large number of objects

- by shane

We have a large number of documents and metadata (xml files) associated with these documents. What is the best way to organize them? Currently we put them into a series of nested folders: /repository/category/date(when they were loaded into our db)/document_number.pdf and .xml We use the path as a unique identifier for the document in our system. This is more versatile than putting them all in a single flat folder. also it is independent from our database/application, so we can reload them in case of failure. Yet, it introduces some limitations. for example we can't move the files once they've been placed in this structure, also it takes work to put them this way. What is the best practice? How websites such as Scribd deal with this problem?

Read the article
Free Large datasets to experiment with Hadoop

- by Sundar

Do you know any large datasets to experiment with Hadoop which is free/low cost? Any pointers/links related is appreciated. Prefernce: Atleast one GB of data. Production log data of webserver. Few of them which I found so far: http://dumps.wikimedia.org/enwiki/20100130/ http://wiki.freebase.com/wiki/Data_dumps http://aws.amazon.com/publicdatasets/ Also can we run our own crawler to gather data from sites e.g. Wikipedia? Any pointers on how to do this is appreciated as well.

Read the article
serving large file using select, epoll or kqueue

- by xask

Nginx uses epoll, or other multiplexing techniques(select) for its handling multiple clients, i.e it does not spawn a new thread for every request unlike apache. I tried to replicate the same in my own test program using select. I could accept connections from multiple client by creating a non-blocking socket and using select to decide which client to serve. My program would simply echo their data back to them .It works fine for small data transfers (some bytes per client) The problem occurs when I need to send a large file over a connection to the client. Since i have only one thread to serve all client till the time I am finished reading the file and writing it over to the socket i cannot resume serving other client. Is there a known solution to this problem, or is it best to create a thread for every such request ?

Read the article
Doing a large number of upserts as fast as possible

- by Jason Swett

My app (which uses MySQL) is doing a large number of subsequent upserts. Right now my SQL looks like this: INSERT IGNORE INTO customer (name,customer_number,social_security_number,phone) VALUES ('VICTOR H KINDELL','123','123','123') INSERT IGNORE INTO customer (name,customer_number,social_security_number,phone) VALUES ('VICTOR H KINDELL','123','123','123') INSERT IGNORE INTO customer (name,customer_number,social_security_number,phone) VALUES ('VICTOR H KINDELL OR','123','123','123') INSERT IGNORE INTO customer (name,customer_number,social_security_number,phone) VALUES ('TRACY L WALTER PERSONAL REP FOR','123','123','123') INSERT IGNORE INTO customer (name,customer_number,social_security_number,phone) VALUES ('TRACY L WALTER PERSONAL REP FOR','123','123','123') So far I've found INSERT IGNORE to be the fastest way to achieve upserts. Selecting a record to see if it exists and then either updating it or inserting a new one is too slow. Even this is not as fast as I'd like because I need to do a separate statement for each record. Sometimes I'll have around 50,000 of these statements in a row. Is there a way to take care of all of these in just one statement, without deleting any existing records?

Read the article
latex large division sign in a math formula

- by Anna

Hi, I have been looking for an answer for some time now, hope you could give me a quick tip. I have an equation with many divisions inside. i.e: $\frac{\frac{a_1}{a_2}} {\frac{b_1}{b_2}}$ To make it more readable, I decided to change the large fraction into "/" sign. i.e. $\frac{a_1}{a_2} / \frac{b_1}{b_2}$ The problem is that the "/" sign remains small, and it is quite ugly. How do I change the "/" sign to have a big font? How do I make it more readable? Thanks.

Read the article
SQL Server 2000 tables

- by klork

We currently have an SQL Server 2000 database with one table containing data for multiple users. The data is keyed by memberid which is an integer field. The table has a clustered index on memberid. The table is now about 200 million rows. Indexing and maintenance are becoming issues. We are debating splitting the table into one table per user model. This would imply that we would end up with a very large number of tables potentially upto the 2,147,483,647, considering just positive values. My questions: Does anyone have any experience with a SQL Server (2000/2005) installation with millions of tables? What are the implications of this architecture with regards to maintenance and access using Query Analyzer, Enterprise Manager etc. What are the implications to having such a large number of indexes in a database instance. All comments are appreciated. Thanks

Read the article
OutOfMemoryException Processing Large File

- by Krip

We are loading a large flat file into BizTalk Server 2006 (Original release, not R2) - about 125 MB. We run a map against it and then take each row and make a call out to a stored procedure. We receive the OutOfMemoryException during orchestration processing, the Windows Service restarts, uses full 2 GB memory, and crashes again. The server is 32-bit and set to use the /3GB switch. Also I've separated the flow into 3 hosts - one for receive, the other for orchestration, and the third for sends. Anyone have any suggestions for getting this file to process wihout error? Thanks, Krip

Read the article
Converting a large SQL Server Database to Azure Storage

- by Laith

Hi guys, I have a very large database structure, (Data is not important at this point, I can migrate the info in the db pretty easily if the structure is done) , all reside in SQL Server and I even published it to SQL Azure, but thinking about the limitation of SQL Azure in size, made me decide to switch most of the tables that do not need all the bells and whistles of SQL Azure to Azure Table and blob storage. I was thinking of creating a TT template that dose that, but was wondering if their is a tool that do that. Any ideas or thoughts. The only tables that i would keep in SQL Azure would anything related to transactions like payments. Appreciate your thoughts and advice

Read the article
Using sed for introducing newline after each > in a +1 gigabyte large one-line text file

- by wasatz

I have a giant text file (about 1,5 gigabyte) with xml data in it. All text in the file is on a single line, and attempting to open it in any text editor (even the ones mentioned in this thread: http://stackoverflow.com/questions/159521/text-editor-to-open-big-giant-huge-large-text-files ) either fails horribly or is totally unusable due to the text editor hanging when attempting to scroll. I was hoping to introduce newlines into the file by using the following sed command sed 's/>/>\n/g' data.xml > data_with_newlines.xml Sadly, this caused sed to give me a segmentation fault. From what I understand, sed reads the file line-by-line which would in this case mean that it attempts to read the entire 1,5 gig file in one line which would most certainly explain the segfault. However, the problem remains. How do I introduce newlines after each in the xml file? Do I have to resort to writing a small program to do this for me by reading the file character-by-character?

Read the article
Optimizing a large iteration of PHP objects (EAV-based)

- by Aron Rotteveel

I am currently working on a project that utilizes the EAV model. This turns out to work quite well, but like many others I am now stumbling upon some performance issues. The data set in this particular case consists of aproximately 2500 entities, each with aprox. 150 attributes. Each entity and each attribute is represented by a PHP-object. Since most parts of the application only iterate through a filtered set of entities, we have not had very large issues yet. Now, however, I am working on an algorithm that requires iteration over the entire dataset, which causes a major impact on performance. This information is perhaps not very much to work with, but since this is an architectural problem, I am hoping for a architectural pattern to help me on the way as well. Each entity, including it's attributes takes up aprox. 500KB of memory.

Read the article

< Previous Page | 48 49 50 51 52 53 54 55 56 57 58 59 | Next Page >