Search Results

Search found 36925 results on 1477 pages for 'large xml document'.

Page 217/1477 | < Previous Page | 213 214 215 216 217 218 219 220 221 222 223 224  | Next Page >

  • Non standard interaction among two tables to avoid very large merge

    - by riko
    Suppose I have two tables A and B. Table A has a multi-level index (a, b) and one column (ts). b determines univocally ts. A = pd.DataFrame( [('a', 'x', 4), ('a', 'y', 6), ('a', 'z', 5), ('b', 'x', 4), ('b', 'z', 5), ('c', 'y', 6)], columns=['a', 'b', 'ts']).set_index(['a', 'b']) AA = A.reset_index() Table B is another one-column (ts) table with non-unique index (a). The ts's are sorted "inside" each group, i.e., B.ix[x] is sorted for each x. Moreover, there is always a value in B.ix[x] that is greater than or equal to the values in A. B = pd.DataFrame( dict(a=list('aaaaabbcccccc'), ts=[1, 2, 4, 5, 7, 7, 8, 1, 2, 4, 5, 8, 9])).set_index('a') The semantics in this is that B contains observations of occurrences of an event of type indicated by the index. I would like to find from B the timestamp of the first occurrence of each event type after the timestamp specified in A for each value of b. In other words, I would like to get a table with the same shape of A, that instead of ts contains the "minimum value occurring after ts" as specified by table B. So, my goal would be: C: ('a', 'x') 4 ('a', 'y') 7 ('a', 'z') 5 ('b', 'x') 7 ('b', 'z') 7 ('c', 'y') 8 I have some working code, but is terribly slow. C = AA.apply(lambda row: ( row[0], row[1], B.ix[row[0]].irow(np.searchsorted(B.ts[row[0]], row[2]))), axis=1).set_index(['a', 'b']) Profiling shows the culprit is obviously B.ix[row[0]].irow(np.searchsorted(B.ts[row[0]], row[2]))). However, standard solutions using merge/join would take too much RAM in the long run. Consider that now I have 1000 a's, assume constant the average number of b's per a (probably 100-200), and consider that the number of observations per a is probably in the order of 300. In production I will have 1000 more a's. 1,000,000 x 200 x 300 = 60,000,000,000 rows may be a bit too much to keep in RAM, especially considering that the data I need is perfectly described by a C like the one I discussed above. How would I improve the performance?

    Read the article

  • Load large images into Bitmap?

    - by GuyNoir
    I'm trying to make a basic application that displays an image from the camera, but I when I try to load the .jpg in from the sdcard with BitmapFactory.decodeFile, it returns null. It doesn't give an out of memory error which I find strange, but the exact same code works fine on smaller images. How does the generic gallery display huge pictures from the camera with so little memory?

    Read the article

  • removing phone number from a document.

    - by Grant Collins
    Hi, I've got a challenge that I am hoping that the SO community is able to help me with. I trying to parse a lot of html documents in my PHP application to remove personal details, such as names, addresses and phone numbers. I can remove most of these details without too much trouble, however the phone number is a real problem for me. My idea is to take the text from these documents and the use a regex to identify the phone numbers and replace them with another value such as 'xxxx'. I've got 2 regex that I am using one for UK landline numbers and one for UK cell/mobile numbers. However when I try and run them against the text it just returns an empty string. I am using the following preg_replace code: $pattens = array( '/^(((\+44\s?\d{4}|\(?0\d{4}\)?)\s?\d{3}\s?\d{3})|((\+44\s?\d{3}|\(?0\d{3}\)?)\s?\d{3}\s?\d{4})|((\+44\s?\d{2}|\(?0\d{2}\)?)\s?\d{4}\s?\d{4}))(\s?\#(\d{4}|\d{3}))?$/', '/^(\+44\s?7\d{3}|\(?07\d{3}\)?)\s?\d{3}\s?\d{3}$/' ); $replace = array('xxxxx', 'xxxxx'); //do the search for the numbers. $updatedContents = preg_replace($pattens, $replace, $htmlContents); At the moment this is causing me a lot of head scratching as I thought that I had this nailed, but at the moment I can't see what's wrong?? I am sure that it is something really simple. Thanks, Grant

    Read the article

  • Hang during databinding of large amount of data to WPF DataGrid

    - by nihi_l_ist
    Im using WPFToolkit datagrid control and do the binding in such way: <WpfToolkit:DataGrid x:Name="dgGeneral" SelectionMode="Single" SelectionUnit="FullRow" AutoGenerateColumns="False" CanUserAddRows="False" CanUserDeleteRows="False" Grid.Row="1" ItemsSource="{Binding Path=Conversations}" > public List<CONVERSATION> Conversations { get { return conversations; } set { if (conversations != value) { conversations = value; NotifyPropertyChanged("Conversations"); } } } public event PropertyChangedEventHandler PropertyChanged; public void NotifyPropertyChanged(string propertyName) { if (PropertyChanged != null) { PropertyChanged(this, new PropertyChangedEventArgs(propertyName)); } } public void GenerateData() { BackgroundWorker bw = new BackgroundWorker(); bw.WorkerSupportsCancellation = bw.WorkerReportsProgress = true; List<CONVERSATION> list = new List<CONVERSATION>(); bw.DoWork += delegate { list = RefreshGeneralData(); }; bw.RunWorkerCompleted += delegate { try { Conversations = list; } catch (Exception ex) { CustomException.ExceptionLogCustomMessage(ex); } }; bw.RunWorkerAsync(); } And than in the main window i call GenerateData() after setting DataCotext of the window to instance of the class, containing GenerateData(). RefreshGeneralData() returns some list of data i want and it returns it fast. Overall there are near 2000 records and 6 columns(im not posting the code i used during grid's initialization, because i dont think it can be the reason) and the grid hangs for almost 10 secs!

    Read the article

  • Problem processing large data using Applet-Servlet communication

    - by Marquinio
    Hi everyone. I have an Applet that makes a request to a Servlet. On the servlet it's using the PrintWriter to write the response back to Applet: out.println("Field1|Field2|Field3|Field4|Field5......|Field10"); There are about 15000 records, so the out.println() gets executed about 15000 times. Problem is that when the Applet gets the response from Servlet it takes about 15 minutes to process the records. I placed System.out.println's and processing is paused at around 5000, then after 15 minutes it continues processing and then its done. Has anyone faced a similar problem? The servlet takes about 2 seconds to execute. So seems that the browser/Applet is too slow to process the records. Any ideas appreciated. Thanks.

    Read the article

  • Efficient file buffering & scanning methods for large files in python

    - by eblume
    The description of the problem I am having is a bit complicated, and I will err on the side of providing more complete information. For the impatient, here is the briefest way I can summarize it: What is the fastest (least execution time) way to split a text file in to ALL (overlapping) substrings of size N (bound N, eg 36) while throwing out newline characters. I am writing a module which parses files in the FASTA ascii-based genome format. These files comprise what is known as the 'hg18' human reference genome, which you can download from the UCSC genome browser (go slugs!) if you like. As you will notice, the genome files are composed of chr[1..22].fa and chr[XY].fa, as well as a set of other small files which are not used in this module. Several modules already exist for parsing FASTA files, such as BioPython's SeqIO. (Sorry, I'd post a link, but I don't have the points to do so yet.) Unfortunately, every module I've been able to find doesn't do the specific operation I am trying to do. My module needs to split the genome data ('CAGTACGTCAGACTATACGGAGCTA' could be a line, for instance) in to every single overlapping N-length substring. Let me give an example using a very small file (the actual chromosome files are between 355 and 20 million characters long) and N=8 import cStringIO example_file = cStringIO.StringIO("""\ header CAGTcag TFgcACF """) for read in parse(example_file): ... print read ... CAGTCAGTF AGTCAGTFG GTCAGTFGC TCAGTFGCA CAGTFGCAC AGTFGCACF The function that I found had the absolute best performance from the methods I could think of is this: def parse(file): size = 8 # of course in my code this is a function argument file.readline() # skip past the header buffer = '' for line in file: buffer += line.rstrip().upper() while len(buffer) = size: yield buffer[:size] buffer = buffer[1:] This works, but unfortunately it still takes about 1.5 hours (see note below) to parse the human genome this way. Perhaps this is the very best I am going to see with this method (a complete code refactor might be in order, but I'd like to avoid it as this approach has some very specific advantages in other areas of the code), but I thought I would turn this over to the community. Thanks! Note, this time includes a lot of extra calculation, such as computing the opposing strand read and doing hashtable lookups on a hash of approximately 5G in size. Post-answer conclusion: It turns out that using fileobj.read() and then manipulating the resulting string (string.replace(), etc.) took relatively little time and memory compared to the remainder of the program, and so I used that approach. Thanks everyone!

    Read the article

  • Managing Large Database Entity Models

    - by ChiliYago
    I would like hear how other's are effectively (or not) working with the Visual Studio Entity Designer when many database tables exists. It seems to me that navigating the Designer is tough enough to find what you are looking for with just a few tables but how about a database with say 100 to 200 tables? When a table change is made at the database level how is the model updated? Does it overwrite any manual changes you have made to the model? How would you quickly find an entity in the designer to make a change or inspect a change? Seems unrealistic to be scrolling around looking for specific entity. Thanks for your feedback!

    Read the article

  • How can I write an XML on my hard drive to GetRequestStream

    - by swolff1978
    I need to post raw xml to a site and read the response. With the following code I keep getting an "Unknown File Format" error and I'm not sure why. XmlDocument sampleRequest = new XmlDocument(); sampleRequest.Load(@"C:\SampleRequest.xml"); byte[] bytes = Encoding.UTF8.GetBytes(sampleRequest.ToString()); string uri = "https://www.sample-gateway.com/gw.aspx"; req = WebRequest.Create(uri); req.Method = "POST"; req.ContentLength = bytes.Length; req.ContentType = "text/xml"; using (var requestStream = req.GetRequestStream()) { requestStream.Write(bytes, 0, bytes.Length); } // Send the data to the webserver rsp = req.GetResponse(); XmlDocument responseXML = new XmlDocument(); using (var responseStream = rsp.GetResponseStream()) { responseXML.Load(responseStream); } I am fairly certain my issue is what/how I am writing to the requestStream so.. How can I modify that code so that I may write an xml located on the hard drive to the request stream?

    Read the article

  • bitshift large strings for encoding QR Codes

    - by icekreaman
    As an example, suppose a QR Code data stream contains 55 data words (each one byte in length) and 15 error correction words (again one byte). The data stream begins with a 12 bit header and ends with four 0 bits. So, 12 + 4 bits of header/footer and 15 bytes of error correction, leaves me 53 bytes to hold 53 alphanumeric characters. The 53 bytes of data and 15 bytes of ec are supplied in a string of length 68 (str68). The problem seems simple enough - concatenate 2 bytes of (right-shifted) header data with str68 and then left shift the entire 70 bytes by 4 bits. This is the first time in many years of programming that I have ever needed to do something like this, I am a c and bit shifting noob, so please be gentle... I have done a little investigation and so far have not been able to figure out how to bitshift 70 bytes of data; any help would be greatly appreciated. Larger QR codes can hold 2000 bytes of data...

    Read the article

  • Document -> Flash viewer, not hosted

    - by Dane
    I've got a content management solution where we present scanned images (TIFF), PDFs, word docs for viewing. While we can simply embed a PDF, sometimes depending on user preferences it's a bit fiddly and sometimes not user-intuitive. I'd like a solution like scribd, embedit, etc, but not hosted. I want to run the application on our own servers and manage it that way (for legal reasons, and our clients won't buy the service if it's hosted somewhere else). SWFtools looks a little basic for my needs, plus doesn't do doc, docx or ppt. Any options? Doesn't have to be free, but would be ideal.

    Read the article

  • Python 3.1 - Memory Error during sampling of a large list

    - by jimy
    The input list can be more than 1 million numbers. When I run the following code with smaller 'repeats', its fine; def sample(x): length = 1000000 new_array = random.sample((list(x)),length) return (new_array) def repeat_sample(x): i = 0 repeats = 100 list_of_samples = [] for i in range(repeats): list_of_samples.append(sample(x)) return(list_of_samples) repeat_sample(large_array) However, using high repeats such as the 100 above, results in MemoryError. Traceback is as follows; Traceback (most recent call last): File "C:\Python31\rnd.py", line 221, in <module> STORED_REPEAT_SAMPLE = repeat_sample(STORED_ARRAY) File "C:\Python31\rnd.py", line 129, in repeat_sample list_of_samples.append(sample(x)) File "C:\Python31\rnd.py", line 121, in sample new_array = random.sample((list(x)),length) File "C:\Python31\lib\random.py", line 309, in sample result = [None] * k MemoryError I am assuming I'm running out of memory. I do not know how to get around this problem. Thank you for your time!

    Read the article

  • Assigning large UInt32 constants in VB.Net

    - by Kumba
    I inquired on VB's erratic behavior of treating all numerics as signed types back in this question, and from the accepted answer there, was able to get by. Per that answer: Visual Basic Literals Also keep in mind you can add literals to your code in VB.net and explicitly state constants as unsigned. So I tried this: Friend Const POW_1_32 As UInt32 = 4294967296UI And VB.NET throws an Overflow error in the IDE. Pulling out the integer overflow checks doesn't seem to help -- this appears to be a flaw in the IDE itself. This, however, doesn't generate an error: Friend Const POW_1_32 As UInt64 = 4294967296UL So this suggests to me that the IDE isn't properly parsing the code and understanding the difference between Int32 and UInt32. Any suggested workarounds and/or possible clues on when MS will make unsigned data types intrinsic to the framework instead of the hacks they currently are?

    Read the article

  • Extract anything that looks like links from large amount of data in python

    - by Riz
    Hi, I have around 5 GB of html data which I want to process to find links to a set of websites and perform some additional filtering. Right now I use simple regexp for each site and iterate over them, searching for matches. In my case links can be outside of "a" tags and be not well formed in many ways(like "\n" in the middle of link) so I try to grab as much "links" as I can and check them later in other scripts(so no BeatifulSoup\lxml\etc). The problem is that my script is pretty slow, so I am thinking about any ways to speed it up. I am writing a set of test to check different approaches, but hope to get some advices :) Right now I am thinking about getting all links without filtering first(maybe using C module or standalone app, which doesn't use regexp but simple search to get start and end of every link) and then using regexp to match ones I need.

    Read the article

  • how can i strip formatting from word document using php

    - by shazia
    I want to display a word file and then extract the content and display it in a separate textarea. i want to do away with the formatting as well. this is what i get when i get the read the text file using php ??????ÿÿÿÿ???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????Running Head: INTERNATIONAL BUSINESS International Business [Name of the writer] [Name of the institution] International Business Question 1 Increasing returns to scale (or economies of scale) in production is an indisputable phenomenon characterizing real world production, and, as such, they have long been recognized as a principal source of economic prosperity. Nonetheless, they have never played a major role in this is the code $fh = fopen($newname, 'r'); $contents => fread($fh, filesize($newname)); fclose($fh); unlink($newname); echo "<br/>"; echo $contents; how can i get rid of all these charecters. Thanks

    Read the article

  • Timeout on Large mySQL Query

    - by Bob Stewart
    I have this query: $theQuery = mysql_query("SELECT phrase, date from wordList WHERE group='nouns'"); while($getWords=mysql_fetch_array($theQuery)) { echo "$getWords[phrase] created on $getWords[date]<br>"; } The data table "wordList" contains 75,000 records in the group "nouns" and every time I load the code I am returned an error. Help!

    Read the article

  • How to write a large number of nested records in JSON with Python

    - by jamesmcm
    I want to produce a JSON file, containing some initial parameters and then records of data like this: { "measurement" : 15000, "imi" : 0.5, "times" : 30, "recalibrate" : false, { "colorlist" : [234, 431, 134] "speclist" : [0.34, 0.42, 0.45, 0.34, 0.78] } { "colorlist" : [214, 451, 114] "speclist" : [0.44, 0.32, 0.45, 0.37, 0.53] } ... } How can this be achieved using the Python json module? The data records cannot be added by hand as there are very many.

    Read the article

  • filesize of large files in c

    - by endeavormac
    How can I get the filesize of a file in C when the filesize is greater than 4gb? ftell returns a 4 byte signed long, limiting it to two bytes. stat has a variable of type off_t which is also 4 bytes (not sure of sign), so at most it can tell me the size of a 4gb file. What if the file is larger than 4 gb?

    Read the article

  • ActionBar SpinnerAdapter Large Branding followed by selection (spinner)

    - by SatanEnglish
    I'm trying to implement a spinner In the action bar that has brand Name above it. With the ActionBar setListNavigationCallbacks method if possible actionBar.setNavigationMode(ActionBar.NAVIGATION_MODE_LIST); actionBar.setListNavigationCallbacks(mSpinnerAdapter, null); Can anyone give me an Idea of how to do this? I would put some code here but I have no idea where to begin as I have not managed to find relevant information yet. Edit: Using V4.0

    Read the article

  • How can I serialize this .NET Collection item?

    - by Pure.Krome
    Hi folks, I'm trying to xml serialize a POCO view data class into xml. It serializes, but incorrectly generates some xml. eg. (current result .. not the one I'm after) <ReviewListViewData> <reviews> <review>....</review> ... </reviews> </ReviewListViewData> I'm trying to get (notice how I've removed the bad root node?) ... <reviews> <review>....</review> ... </reviews> Class is defined as... public class ReviewListViewData { [XmlArray("reviews")] [XmlArrayItem("review")] public ReviewViewData[] Reviews { get; set; } } and here's a sample way it's called in an ASP.NET MVC ActionMethod :- var reviewListViewData = GetReviewListViewData(...); return XmlResult(reviewListViewData); // (XmlResult referenced from MVCContrib). anyone have any ideas, please?

    Read the article

  • inserting unique date into txt document

    - by durian
    I'm trying this script to insert only a unique date into a text file, but it isn't working properly: $log_file_name = "logfile.txt"; $log_file_path = "log_files/$id/$log_file_name"; if(file_exists($log_file_path)){ $not = "not"; $todaydate = date('d,m,Y'); $today = "$todaydate;"; $strlength = strlen($today); $file_contents = file_get_contents($log_file_path); $file_contents_arry = explode(";",$file_contents); if(!in_array($todaytodaydate,$file_contents_arry)){ $append = fopen($log_file_path, 'a'); $write = fwrite($append,$today); //writes our string to our file. $close = fclose($append); //closes our file } else { $append = fopen($log_file_path, 'a'); $write = fwrite($append,$not); //writes our string to our file. $close = fclose($append); //closes our file } } else{ mkdir("log_files/$id", 0700); $todaydate = date('d,m,Y'); $today = "$todaydate;"; $strlength = strlen($today); $create = fopen($log_file_path, "w"); $write = fwrite($create, $today, $strlength); //writes our string to our file. $close = fclose($create); //closes our file } The problem is with the if else statement where it should be written if it's already in the array.

    Read the article

  • "Thread was being aborted" 0n large dataset

    - by Donaldinio
    I am trying to process 114,000 rows in a dataset (populated from an oracle database). I am hitting an error at around the 600 mark - "Thread was being aborted". All I am doing is reading the dataset, and I still hit the issue. Is this too much data for a dataset? It seems to load into the dataset ok though. I welcome any better ways to process this amount of data. rootTermsTable = entKw.GetRootKeywordsByCategory(catID); for (int k = 0; k < rootTermsTable.Rows.Count; k++) { string keywordID = rootTermsTable.Rows[k]["IK_DBKEY"].ToString(); ... } public DataTable GetKeywordsByCategory(string categoryID) { DbProviderFactory provider = DbProviderFactories.GetFactory(connectionProvider); DbConnection con = provider.CreateConnection(); con.ConnectionString = connectionString; DbCommand com = provider.CreateCommand(); com.Connection = con; com.CommandText = string.Format("Select * From icm_keyword WHERE (IK_IC_DBKEY = {0})",categoryID); com.CommandType = CommandType.Text; DataSet ds = new DataSet(); DbDataAdapter ad = provider.CreateDataAdapter(); ad.SelectCommand = com; con.Open(); ad.Fill(ds); con.Close(); DataTable dt = new DataTable(); dt = ds.Tables[0]; return dt; //return ds.Tables[0].DefaultView; }

    Read the article

  • How to handle large table in MySQL ?

    - by Frantz Miccoli
    I've a database used to store items and properties about these items. The number of properties is extensible, thus there is a join table to store each property associated to an item value. CREATE TABLE `item_property` ( `property_id` int(11) NOT NULL, `item_id` int(11) NOT NULL, `value` double NOT NULL, PRIMARY KEY (`property_id`,`item_id`), KEY `item_id` (`item_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; This database has two goals : storing (which has first priority and has to be very quick, I would like to perform many inserts (hundreds) in few seconds), retrieving data (selects using item_id and property_id) (this is a second priority, it can be slower but not too much because this would ruin my usage of the DB). Currently this table hosts 1.6 billions entries and a simple count can take up to 2 minutes... Inserting isn't fast enough to be usable. I'm using Zend_Db to access my data and would really be happy if you don't suggest me to develop any php side part. Thanks for your advices !

    Read the article

< Previous Page | 213 214 215 216 217 218 219 220 221 222 223 224  | Next Page >