Search Results

Search found 65999 results on 2640 pages for 'large data volumes'.

Page 114/2640 | < Previous Page | 110 111 112 113 114 115 116 117 118 119 120 121 | Next Page >

Discover periodic patterns in a large data-set

- by Miner

I have a large sequence of tuples on disk in the form (t1, k1) (t2, k2) ... (tn, kn) ti is a monotonically increasing timestamp and ki is a key (assume a fixed length string if needed). Neither ti nor ki are guaranteed to be unique. However, the number of unique tis and kis is huge (millions). n itself is very large (100 Million+) and the size of k (approx 500 bytes) makes it impossible to store everything in memory. I would like to find out periodic occurrences of keys in this sequence. For example, if I have the sequence (1, a) (2, b) (3, c) (4, b) (5, a) (6, b) (7, d) (8, b) (9, a) (10, b) The algorithm should emit (a, 4) and (b, 2). That is a occurs with a period of 4 and b occurs with a period of 2. If I build a hash of all keys and store the average of the difference between consecutive timestamps of each key and a std deviation of the same, I might be able to make a pass, and report only the ones that have an acceptable std deviation(ideally, 0). However, it requires one bucket per unique key, whereas in practice, I might have very few really periodic patterns. Any better ways?

Read the article
Can a large transaction log cause cpu hikes to occur

- by Simon Rigby

Hello all, I have a client with a very large database on Sql Server 2005. The total space allocated to the db is 15Gb with roughly 5Gb to the db and 10 Gb to the transaction log. Just recently a web application that is connecting to that db is timing out. I have traced the actions on the web page and examined the queries that execute whilst these web operation are performed. There is nothing untoward in the execution plan. The query itself used multiple joins but completes very quickly. However, the db server's CPU hikes to 100% for a few seconds. The issue occurs when several simultaneous users are working on the system (when I say multiple .. read about 5). Under this timeouts start to occur. I suppose my question is, can a large transaction log cause issues with CPU performance? There is about 12Gb of free space on the disk currently. The configuration is a little out of my hands but the db and log are both on the same physical disk. I appreciate that the log file is massive and needs attending to, but I'm just looking for a heads up as to whether this may cause CPU spikes (ie trying to find the correlation). The timeouts are a recent thing and this app has been responsive for a few years (ie its a recent manifestation). Many Thanks,

Read the article
About Data Objects and DAO Design when using Hibernate

- by X. Ma

I'm hesitating between two designs of a database project using Hibernate. Design #1. (1) Create a general data provider interface, including a set of DAO interfaces and general data container classes. It hides the underneath implementation. A data provider implementation could access data in database, or an XML file, or a service, or something else. The user of a data provider does not to know about it. (2) Create a database library with Hibernate. This library implements the data provider interface in (1). The bad thing about Design #1 is that in order to hide the implementation details, I need to create two sets of data container classes. One in the general data provider interface - let's call them DPI-Objects, the other set is used in the database library, exclusively for entity/attribute mapping in Hibernate - let's call them H-Objects. In the DAO implementation, I need to read data from database to create H-Objects (via Hibernate) and then convert H-Objects into DPI-Objects. Design #2. Do not create a general data provider interface. Expose H-Objects directly to components that use the database lib. So the user of the database library needs to be aware of Hibernate. I like design #1 more, but I don't want to create two sets of data container classes. Is that the right way to hide H-Objects and other Hibernate implementation details from the user who uses the database-based data provider? Are there any drawbacks of Design #2? I will not implement other data provider in the new future, so should I just forget about the data provider interface and use Design #2? What do you think about this? Thanks for your time!

Read the article
importing a large txt file in MySQL ?

- by Taz

Hi I am loading a text data in MySQL using the following command 'mysql> Load Data local Infile 'C:\\Documents and Settings\\Scan\\My Documents\\D ownloads\\instance_types_en.nt\\Copy of instance_types_en.txt' into table dbpedi aentities.resources fields terminated by ' ' lines terminated by 'rn';' Data is like (actually there is a newline after '.') <a> <c> . <a> <c> . <a> <c> . <a> <c> .<a> <c> . <a> <c> . Table has and auto increment ID field and then text fields for all three values. File size is about 750MB The problems are 1. appears to be in first text field 2. only 2MB data is imported

Read the article
How do I bind an iTunes style source list to an NSTableView using Core Data?

- by Austin

I have an iTunes style interface in my application: Source list (NSOutlineView) on the left that contains different libraries and playlists with an NSTableView on the right side of the interface displaying information for "Presentations". Similar to iTunes, I am showing the same type of information in the table view whether a library or playlist is selected (title, author, date created, etc). I currently have an NSArrayController connected to my NSTableView and was setting the fetch predicate based on what was selected in the source list. This works fine when selecting a library because I can just set the fetch predicate to filter by the "type" field in my Presentation Core Data entity. When I try to adjust the fetch predicate for the playlist however, it doesn't look like there is any way to set the fetch predicate because I've got a table in between Playlists and Presentations to keep up with the order within the Playlist. According to the Apple docs, these type of predicates are not doable with Core Data (it basically doesn't multiple inner joins). Below is the relevant portion of my Data Model. Is my data model setup incorrectly? Should I drop the NSArrayController and handle connecting the NSTableView up by hand? I'm trying to figure out if there is a simple fix, or really a design flaw.

Read the article
Qt/C++, Problems with large QImage

- by David Günzel

I'm pretty new to C++/Qt and I'm trying to create an application with Visual Studio C++ and Qt (4.8.3). The application displays images using a QGraphicsView, I need to change the images at pixel level. The basic code is (simplified): QImage* img = new QImage(img_width,img_height,QImage::Format_RGB32); while(do_some_stuff) { img->setPixel(x,y,color); } QGraphicsPixmapItem* pm = new QGraphicsPixmapItem(QPixmap::fromImage(*img)); QGraphicsScene* sc = new QGraphicsScene; sc->setSceneRect(0,0,img->width(),img->height()); sc->addItem(pm); ui.graphicsView->setScene(sc); This works well for images up to around 12000x6000 pixel. The weird thing happens beyond this size. When I set img_width=16000 and img_height=8000, for example, the line img = new QImage(...) returns a null image. The image data should be around 512,000,000 bytes, so it shouldn't be too large, even on a 32 bit system. Also, my machine (Win 7 64bit, 8 GB RAM) should be capable of holding the data. I've also tried this version: uchar* imgbuf = (uchar*) malloc(img_width*img_height*4); QImage* img = new QImage(imgbuf,img_width,img_height,QImage::Format_RGB32); At first, this works. The img pointer is valid and calling img-width() for example returns the correct image width (instead of 0, in case the image pointer is null). But as soon as I call img-setPixel(), the pointer becomes null and img-width() returns 0. So what am I doing wrong? Or is there a better way of modifying large images on pixel level? Regards, David

Read the article
Handling missing/incomplete data in R--is there function to mask but not remove NAs?

- by doug

As you would expect from a DSL aimed at data analysis, R handles missing/incomplete data very well, for instance: Many R functions have an 'na.rm' flag that you can set to 'T' to remove the NAs: mean( c(5,6,12,87,9,NA,43,67), na.rm=T) But if you want to deal with NAs before the function call, you need to do something like this: to remove each 'NA' from a vector: vx = vx[!is.na(a)] to remove each 'NA' from a vector and replace it w/ a '0': ifelse(is.na(vx), 0, vx) to remove entire each row that contains 'NA' from a data frame: dfx = dfx[complete.cases(dfx),] All of these functions permanently remove 'NA' or rows with an 'NA' in them. Sometimes this isn't quite what you want though--making an 'NA'-excised copy of the data frame might be necessary for the next step in the workflow but in subsequent steps you often want those rows back (e.g., to calculate a column-wise statistic for a column that has missing rows caused by a prior call to 'complete cases' yet that column has no 'NA' values in it). to be as clear as possible about what i'm looking for: python/numpy has a class, 'masked array', with a 'mask' method, which lets you conceal--but not remove--NAs during a function call. Is there an analogous function in R?

Read the article
High Runtime for Dictionary.Add for a large amount of items

- by aaginor

Hi folks, I have a C#-Application that stores data from a TextFile in a Dictionary-Object. The amount of data to be stored can be rather large, so it takes a lot of time inserting the entries. With many items in the Dictionary it gets even worse, because of the resizing of internal array, that stores the data for the Dictionary. So I initialized the Dictionary with the amount of items that will be added, but this has no impact on speed. Here is my function: private Dictionary<IdPair, Edge> AddEdgesToExistingNodes(HashSet<NodeConnection> connections) { Dictionary<IdPair, Edge> resultSet = new Dictionary<IdPair, Edge>(connections.Count); foreach (NodeConnection con in connections) { ... resultSet.Add(nodeIdPair, newEdge); } return resultSet; } In my tests, I insert ~300k items. I checked the running time with ANTS Performance Profiler and found, that the Average time for resultSet.Add(...) doesn't change when I initialize the Dictionary with the needed size. It is the same as when I initialize the Dictionary with new Dictionary(); (about 0.256 ms on average for each Add). This is definitely caused by the amount of data in the Dictionary (ALTHOUGH I initialized it with the desired size). For the first 20k items, the average time for Add is 0.03 ms for each item. Any idea, how to make the add-operation faster? Thanks in advance, Frank

Read the article
How do you refactor a large messy codebase?

- by Ricket

I have a big mess of code. Admittedly, I wrote it myself - a year ago. It's not well commented but it's not very complicated either, so I can understand it -- just not well enough to know where to start as far as refactoring it. I violated every rule that I have read about over the past year. There are classes with multiple responsibilities, there are indirect accesses (I forget the technical term - something like foo.bar.doSomething()), and like I said it is not well commented. On top of that, it's the beginnings of a game, so the graphics is coupled with the data, or the places where I tried to decouple graphics and data, I made the data public in order for the graphics to be able to access the data it needs... It's a huge mess! Where do I start? How would you start on something like this? My current approach is to take variables and switch them to private and then refactor the pieces that break, but that doesn't seem to be enough. Please suggest other strategies for wading through this mess and turning it into something clean so that I can continue where I left off!

Read the article
Windows.Forms RichTextBox Control - Avoid inserting large data.

- by SchlaWiener

I have a Windows Form with a RichTextBox on it. The content of the RichTextBox is written to a database field that ist limited to 64k data. For my purpose that is way more than enough text to store. I have set the MaxLength property to avoid insertng more data than allowed. rtcControl.MaxLength = 65536 Howevery, that only restricts the amount of characters that so is allowed to put in the text. But with the formatting overhead from the Rtf I can type more text than I should be allowed to. It even get's worse if I insert a large image, which dosn't increase the TextLength at all but the Rtf Length grows quite a lot. At the moment I check the Length of the richttextboxes' Rtf property in the FormClosing event and display a message to the user if it's to large. However that is just a workaround because I want to disallow putting more data than allowed into the control (like in a textbox if you exceed the MaxLength property nothing is inserted into the control and you hear the default beep(). Any ideas how to achive this? I already tried: using a custom control which extends the richtextbox and shadows th Rtf property to intercept the insertation. But it seems it isn't executed if I add text. Even the TextChanged Event does not fire if I type smth. in the control.

Read the article
Git repository gets corrupted when I do a large commit: "Possible repository corruption on the remot

- by mindthief

Hi All, A friend of mine and I have been trying to use git for a project. It is hosted on his server, and I git clone it as: git clone [email protected]:/path/to/git/repos.git Pretty standard stuff, and it works great for a while. But every time one of us has added a large commit (which git supposedly handles very well), of the order of 100MB or so, the git repository gets kind of broken. Basically, at this point I will be able to push new changes and pull other changes (I think), but when I try to clone the repository in a fresh location using that command above, I get an error message that says: $git clone [email protected]:/path/to/git/repos.git Initialized empty Git repository in /local/path/to/repos/.git/ remote: Counting objects: 1455, done. remote: Compressing objects: 100% (1235/1235), done. error: git upload-pack: git-pack-objects died with error.s fatal: git upload-pack: aborting due to possible repository corruption on the remote side. remote: aborting due to possible repository corruption on the remote side. fatal: early EOF fatal: index-pack failed This has happened 3 or 4 times now, and it's always when I add a large commit. Any idea why this is happening? How can we fix it? We're both using Mac OSX Snow Leopard. Thanks! -M

Read the article
Memorystream and Large Object Heap

- by Flo

I have to transfer large files between computers on via unreliable connections using WCF. Because I want to be able to resume the file and I don't want to be limited in my filesize by WCF, I am chunking the files into 1MB pieces. These "chunk" are transported as stream. Which works quite nice, so far. My steps are: open filestream read chunk from file into byet[] and create memorystream transfer chunk back to 2. until the whole file is sent My problem is in step 2. I assume that when I create a memory stream from a byte array, it will end up on the LOH and ultimately cause an outofmemory exception. I could not actually create this error, maybe I am wrong in my assumption. Now, I don't want to send the byte[] in the message, as WCF will tell me the array size is too big. I can change the max allowed array size and/or the size of my chunk, but I hope there is another solution. My actual question(s): Will my current solution create objects on the LOH and will that cause me problem? Is there a better way to solve this? Btw.: On the receiving side I simple read smaller chunks from the arriving stream and write them directly into the file, so no large byte arrays involved.

Read the article
Including/Organzing HTML in large javascript project

- by Bill Zimmerman

Hi, I've a got a fairly large web app, with several mini applets on each page. These applets are almost always identical jquery apps. I am looking for advice on how I should organize/include smaller parts of these jquery apps within my larger project. For example, each app has several independent tabs. If possible, I would like to store each of the tabs as a seperate .html file because this makes development easier. My requirements are: 1) All of the html 'tabs' are loaded on the clients end when the page loads. I would like to avoid any delays by dynamically requesting the tab html. 2) If possible, I would like to minimize the raw data sent. For example, it would be preferable to send each tab 1 time, instead of sending each tab 10 times if there are ten applets on that page. Questions: 1) What are my options for 'including' the HTML files / javascript code 2) Any tips for keeping my development simple in this situation? Surely there has to be a better way than just editing one massive html file when working with large pages.

Read the article
Finding ALL positions of a substring in a large string in C#

- by Tommy

Alright, so what i have, is a large string i need to parse, and what i need to happen, is find all the instances of extract"(me,i-have lots. of]punctuation, and store them to a list. So say this piece of string was in the beginning and middle of the larger string, both of them would be found, and their index's would be added to the List. and the List would contain 0 and the other index whatever it would be. Ive been playing around, and the string.IndexOf does almost what i'm looking for, and ive written some code. But i cant seem to get it to work: List<int> inst = new List<int>(); int index = 0; while (index < source.LastIndexOf("extract\"(me,i-have lots. of]punctuation", 0) + 39) { int src = source.IndexOf("extract\"(me,i-have lots. of]punctuation", index); inst.Add(src); index = src + 40; } inst = The list source = The large string Any better ideas?

Read the article
Loading/Displaying large amount of data on webpage.

- by jb

I have a webpage which contains a table for displaying a large amount of data (on average from 2,000 to 10,000 rows). This page takes a long time to load/render. Which is understandable. The problem is, while the page is loading the PCs memory usage skyrockets (500mb on my test system is in use by iexplorer) and the whole PC grinds to a halt until it has finished, which can take a minute or two. IE hangs until it is complete, switching to another running program is the same. I need to fix this - and ideally i want to accomplish 2 things: 1) Load individual parts of the page seperately. So the page can render initially without the large data table. A loading div will be placed there until it is ready. 2) Dont use up so much memory or local resources while rendering - so at least they can use a different tab/application at the same time. How would I go about doing both or either of these? I'm an applications programmer by trade so i am still a little fizzy on the things I can do in a web environment. Cheers all.

Read the article
PHP: problem rendering large images (error 321)

- by JP19

Hi ... its me again with a php problem :) Following is part of my PHP script which is rendering JPEG images. ... $tf=$requested_file; $image_type="jpeg"; header("Content-type: image/${image_type}"); $CMD="\$image=imagecreatefrom${image_type}('$tf'); image${image_type}(\$image);"; eval($CMD); exit; ... There is no syntactical error, because above code is working fine for small images, but for large images, it gives: Error 321 (net::ERR_INVALID_CHUNKED_ENCODING): Unknown error. in the browser. To be sure, I created two images using imagemagick from same source image - one resized to 10% of original and other 90%. http://mostpopularsports.net/images/misc/ttt10.jpg works http://mostpopularsports.net/images/misc/ttt90.jpg gives Error 301 in the browser. There is a related question with solution posted by OP here Error writing content through Apache. but I cannot understand how to make the fix. Can someome help me with it? I have looked at the headers in Chrome. For the first request, everything is fine. For the second request - the request headers are all garbled. Both images are jpeg (as they are created from imagemagick. But still to be sure I checked): misc/ttt10.jpg: JPEG image data, JFIF standard 1.01 misc/ttt90.jpg: JPEG image data, JFIF standard 1.01 Finally, the way I fixed is, remove the Transfer-Encoding: chunked header from the response. [This header was sent by apache only when the data was large enough]. (I had an internal proxy, so did it in the proxy script - otherwise one may need to do it in apache settings). There were some good answers and I have selected the one that helped me solve the problem best. thanks JP

Read the article
Out-of-memory algorithms for addressing large arrays

- by reve_etrange

I am trying to deal with a very large dataset. I have k = ~4200 matrices (varying sizes) which must be compared combinatorially, skipping non-unique and self comparisons. Each of k(k-1)/2 comparisons produces a matrix, which must be indexed against its parents (i.e. can find out where it came from). The convenient way to do this is to (triangularly) fill a k-by-k cell array with the result of each comparison. These are ~100 X ~100 matrices, on average. Using single precision floats, it works out to 400 GB overall. I need to 1) generate the cell array or pieces of it without trying to place the whole thing in memory and 2) access its elements (and their elements) in like fashion. My attempts have been inefficient due to reliance on MATLAB's eval() as well as save and clear occurring in loops. for i=1:k [~,m] = size(data{i}); cur_var = ['H' int2str(i)]; %# if i == 1; save('FileName'); end; %# If using a single MAT file and need to create it. eval([cur_var ' = cell(1,k-i);']); for j=i+1:k [~,n] = size(data{j}); eval([cur_var '{i,j} = zeros(m,n,''single'');']); eval([cur_var '{i,j} = compare(data{i},data{j});']); end save(cur_var,cur_var); %# Add '-append' when using a single MAT file. clear(cur_var); end The other thing I have done is to perform the split when mod((i+j-1)/2,max(factor(k(k-1)/2))) == 0. This divides the result into the largest number of same-size pieces, which seems logical. The indexing is a little more complicated, but not too bad because a linear index could be used. Does anyone know/see a better way?

Read the article
How to write these two queries for a simple data warehouse, using ANSI SQL?

- by morpheous

I am writing a simple data warehouse that will allow me to query the table to observe periodic (say weekly) changes in data, as well as changes in the change of the data (e.g. week to week change in the weekly sale amount). For the purposes of simplicity, I will present very simplified (almost trivialized) versions of the tables I am using here. The sales data table is a view and has the following structure: CREATE TABLE sales_data ( sales_time date NOT NULL, sales_amt double NOT NULL ) For the purpose of this question. I have left out other fields you would expect to see - like product_id, sales_person_id etc, etc, as they have no direct relevance to this question. AFAICT, the only fields that will be used in the query are the sales_time and the sales_amt fields (unless I am mistaken). I also have a date dimension table with the following structure: CREATE TABLE date_dimension ( id integer NOT NULL, datestamp date NOT NULL, day_part integer NOT NULL, week_part integer NOT NULL, month_part integer NOT NULL, qtr_part integer NOT NULL, year_part integer NOT NULL, ); which partition dates into reporting ranges. I need to write queries that will allow me to do the following: Return the change in week on week sales_amt for a specified period. For example the change between sales today and sales N days ago - where N is a positive integer (N == 7 in this case). Return the change in change of sales_amt for a specified period. For in (1). we calculated the week on week change. Now we want to know how that change is differs from the (week on week) change calculated last week. I am stuck however at this point, as SQL is my weakest skill. I would be grateful if an SQL master can explain how I can write these queries in a DB agnostic way (i.e. using ANSI SQL).

Read the article
[C#] How to receive uncrackable data or so ? ;P

- by Prix

Hi, I am working on an C# application to communicate with my website and retrieve some information from it, using SSL which is working just fine. Now what i want/need is a way to receive encrypted or codified or obfuscated data that if some one cracks my application they will not be able to decrypt the data because it needs something from the server (api, website) but yet the application needs to decrypt it in order to use it... initally i was thinking of an inside RSA pair or keys, to send and receive the encrypt data but let's consider that someone has cracked the application, they could just replace those keys for keys they have made, so i was looking into some methods but havent found or been able to think of any way to harder this... I was learning about RSA, encryption and such and started developing this as a self learning and got involved with it and now i am trying to figure out a way to receive data like that... I have considered obfuscating and compiling my code with packers and etc but this is not about packing it etc... i am more interested in knowing a better way to secure what i described i know it may or is impossible but yet i am looking forward to some approch. I would appreciate advices, suggestions and C# code samples, if you need more information or anything please let me know.

Read the article
How do I authenticate an ADO.NET Data Service?

- by lsb

Hi! I've created an ADO.Net Data Service hosted in a Azure worker role. I want to pass credentials from a simple console client to the service then validate them using a QueryInterceptor. Unfortunately, the credentials don't seem to be making it over the wire. The following is a simplified version of the code I'm using, starting with the DataService on the server: using System; using System.Data.Services; using System.Linq.Expressions; using System.ServiceModel; using System.Web; namespace Oslo.Worker { [ServiceBehavior(AddressFilterMode = AddressFilterMode.Any)] public class AdminService : DataService<OsloEntities> { public static void InitializeService( IDataServiceConfiguration config) { config.SetEntitySetAccessRule("*", EntitySetRights.All); config.SetServiceOperationAccessRule("*", ServiceOperationRights.All); } [QueryInterceptor("Pairs")] public Expression<Func<Pair, bool>> OnQueryPairs() { // This doesn't work!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! if (HttpContext.Current.User.Identity.Name != "ADMIN") throw new Exception("Ooops!"); return p => true; } } } Here's the AdminService I'm using to instantiate the AdminService in my Azure worker role: using System; using System.Data.Services; namespace Oslo.Worker { public class AdminHost : DataServiceHost { public AdminHost(Uri baseAddress) : base(typeof(AdminService), new Uri[] { baseAddress }) { } } } And finally, here's the client code. using System; using System.Data.Services.Client; using System.Net; using Oslo.Shared; namespace Oslo.ClientTest { public class AdminContext : DataServiceContext { public AdminContext(Uri serviceRoot, string userName, string password) : base(serviceRoot) { Credentials = new NetworkCredential(userName, password); } public DataServiceQuery<Order> Orders { get { return base.CreateQuery<Pair>("Orders"); } } } } I should mention that the code works great with the signal exception that the credentials are not being passed over the wire. Any help in this regard would be greatly appreciated! Thanks....

Read the article
Spreadsheet_Excel_Writer large data output is damaged

- by dr3w

I use Spreadsheet_Excel_Writer to generate .xls file and it works fine until I have to deal with a large amount of data. On certain stage it just writes some nonsense chars and quits filling certain columns. However some columns are field up to the end (generally numeric data) I'm not quite sure how the xls document is formed: row by row, or col by col... Also it is obviously not an error in a string, because when i cut out some data, the error appears a little bit further. I think there is no need in all of my code here are some essentials $filename = 'file.xls'; $workbook = & new Spreadsheet_Excel_Writer(); $workbook->setVersion(8); $contents =& $workbook->addWorksheet('Logistics'); $contents->setInputEncoding('UTF-8'); $workbook->send($filename); //here is the part where I write data down $contents->write(0, 0, 'Field A'); $contents->write(0, 1, 'Field B'); $contents->write(0, 2, 'Field C'); $ROW=1; foreach($ordersArr as $key=>$val){ $contents->write($ROW, 0, $val['a']); $contents->write($ROW, 1, $val['b']); $contents->write($ROW, 2, $val['c']); $ROW++; } $workbook->close();

Read the article
Handling large datasets with PHP/Drupal

- by jo

Hi all, I have a report page that deals with ~700k records from a database table. I can display this on a webpage using paging to break up the results. However, my export to PDF/CSV functions rely on processing the entire data set at once and I'm hitting my 256MB memory limit at around 250k rows. I don't feel comfortable increasing the memory limit and I haven't got the ability to use MySQL's save into outfile to just serve a pre-generated CSV. However, I can't really see a way of serving up large data sets with Drupal using something like: $form = array(); $table_headers = array(); $table_rows = array(); $data = db_query("a query to get the whole dataset"); while ($row = db_fetch_object($data)) { $table_rows[] = $row->some attribute; } $form['report'] = array('#value' => theme('table', $table_headers, $table_rows); return $form; Is there a way of getting around what is essentially appending to a giant array of arrays? At the moment I don't see how I can offer any meaningful report pages with Drupal due to this. Thanks

Read the article
Large free block of english non-pronoun text

- by Tom

As part of teaching myself python I've written a script which allows a user to play hangman. At the moment, the hangman word to be guessed is simply entered manually at the start of the script's code. I want instead for the script to choose randomly from a large list of english words. This I know how to do - my problem is finding that list of words to work from in the first place. Does anyone know of a source on the net for, say, 1000 common english words where they can be downloaded as a block of text or something similar that I can work with? (My initial thought was grabbing a chunk of a novel from project gutenburg [this project is only for my own amusement and won't be available anywhere else so copyright etc doesn't matter hugely to me btw], but anything like that is likely to contain too many names or non-standard words that wouldn't be suitable for hangman. I need text that only has words legal for use in scrabble, basically). It's a slightly odd question for here I suppose, but actually I thought the answer might be of use not just to me but anyone else working on a project for a wordgame or similar that needs a large seed list of words to work from. Many thanks for any links or suggestions :)

Read the article
Dealing with large number of text strings

- by Fadrian

My project when it is running, will collect a large number of string text block (about 20K and largest I have seen is about 200K of them) in short span of time and store them in a relational database. Each of the string text is relatively small and the average would be about 15 short lines (about 300 characters). The current implementation is in C# (VS2008), .NET 3.5 and backend DBMS is Ms. SQL Server 2005 Performance and storage are both important concern of the project, but the priority will be performance first, then storage. I am looking for answers to these: Should I compress the text before storing them in DB? or let SQL Server worry about compacting the storage? Do you know what will be the best compression algorithm/library to use for this context that gives me the best performance? Currently I just use the standard GZip in .NET framework Do you know any best practices to deal with this? I welcome outside the box suggestions as long as it is implementable in .NET framework? (it is a big project and this requirements is only a small part of it) EDITED: I will keep adding to this to clarify points raised I don't need text indexing or searching on these text. I just need to be able to retrieve them in later stage for display as a text block using its primary key. I have a working solution implemented as above and SQL Server has no issue at all handling it. This program will run quite often and need to work with large data context so you can imagine the size will grow very rapidly hence every optimization I can do will help.

Read the article
faster way to compare rows in a data frame

- by aguiar

Consider the data frame below. I want to compare each row with rows below and then take the rows that are equal in more than 3 values. I wrote the code below, but it is very slow if you have a large data frame. How could I do that faster? data <- as.data.frame(matrix(c(10,11,10,13,9,10,11,10,14,9,10,10,8,12,9,10,11,10,13,9,13,13,10,13,9), nrow=5, byrow=T)) rownames(data)<-c("sample_1","sample_2","sample_3","sample_4","sample_5") >data V1 V2 V3 V4 V5 sample_1 10 11 10 13 9 sample_2 10 11 10 14 9 sample_3 10 10 8 12 9 sample_4 10 11 10 13 9 sample_5 13 13 10 13 9 tab <- data.frame(sample = NA, duplicate = NA, matches = NA) dfrow <- 1 for(i in 1:nrow(data)) { sample <- data[i, ] for(j in (i+1):nrow(data)) if(i+1 <= nrow(data)) { matches <- 0 for(V in 1:ncol(data)) { if(data[j,V] == sample[,V]) { matches <- matches + 1 } } if(matches > 3) { duplicate <- data[j, ] pair <- cbind(rownames(sample), rownames(duplicate), matches) tab[dfrow, ] <- pair dfrow <- dfrow + 1 } } } >tab sample duplicate matches 1 sample_1 sample_2 4 2 sample_1 sample_4 5 3 sample_2 sample_4 4

Read the article

Search Results

Search found 65999 results on 2640 pages for 'large data volumes'.

Page 114/2640 | < Previous Page | 110 111 112 113 114 115 116 117 118 119 120 121 | Next Page >

- by Miner

- by Simon Rigby

- by X. Ma

- by Taz

- by Austin

- by David Günzel

- by doug

- by aaginor

- by Ricket

- by SchlaWiener

- by mindthief

- by Flo

- by Bill Zimmerman

- by Tommy

- by jb

- by JP19

- by reve_etrange

- by morpheous

- by Prix

- by lsb

- by dr3w

- by jo

- by Tom

- by Fadrian

- by aguiar

< Previous Page | 110 111 112 113 114 115 116 117 118 119 120 121 | Next Page >