Search Results

Search found 1813 results on 73 pages for 'parser'.

Page 59/73 | < Previous Page | 55 56 57 58 59 60 61 62 63 64 65 66 | Next Page >

Autodetect Presence of CSV Headers in a File

- by banzaimonkey

Short question: How do I automatically detect whether a CSV file has headers in the first row? Details: I've written a small CSV parsing engine that places the data into an object that I can access as (approximately) an in-memory database. The original code was written to parse third-party CSV with a predictable format, but I'd like to be able to use this code more generally. I'm trying to figure out a reliable way to automatically detect the presence of CSV headers, so the script can decide whether to use the first row of the CSV file as keys / column names or start parsing data immediately. Since all I need is a boolean test, I could easily specify an argument after inspecting the CSV file myself, but I'd rather not have to (go go automation). I imagine I'd have to parse the first 3 to ? rows of the CSV file and look for a pattern of some sort to compare against the headers. I'm having nightmares of three particularly bad cases in which: The headers include numeric data for some reason The first few rows (or large portions of the CSV) are null There headers and data look too similar to tell them apart If I can get a "best guess" and have the parser fail with an error or spit out a warning if it can't decide, that's OK. If this is something that's going to be tremendously expensive in terms of time or computation (and take more time than it's supposed to save me) I'll happily scrap the idea and go back to working on "important things". I'm working with PHP, but this strikes me as more of an algorithmic / computational question than something that's implementation-specific. If there's a simple algorithm I can use, great. If you can point me to some relevant theory / discussion, that'd be great, too. If there's a giant library that does natural language processing or 300 different kinds of parsing, I'm not interested.

Read the article
Parsing SOAP XML in Oracle

- by user258587

Hi I am new to Oracle and I am working on something that needs to parse a SOAP request and save the address to DB Tables. I am using the XML parser in Oracle (XMLType) with XPath but am struggling since I can't figure out the way to parse the SOAP request because it has multiple namespaces. Could anyone give me an example? Thanks in advance!!! edit It would be a typical SOAP request similar to the one below. <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:soap="http://soap.service.****.com"> <soapenv:Header /> <soapenv:Body> <soap:UpdateElem> <soap:request> <soap:att1>123456789</soap:att1> <soap:att2 xsi:nil="true" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" /> <soap:att3>L</soap:att3> ..... </soap:request> </soap:UpdateElem> </soapenv:Body> </soapenv:Envelope> I need to retrieve parameters att1, att2... and save them in to a DB table.

Read the article
Turning off hibernate logging console output

- by Jared

I'm using hibernate 3 and want to stop it from dumping all the startup messages to the console. I tried commenting out the stdout lines in log4j.properties but no luck. I've pasted my log file below. Also I'm using eclipse with the standard project structure and have a copy of log4j.properties in both the root of the project folder and the bin folder. ### direct log messages to stdout ### #log4j.appender.stdout=org.apache.log4j.ConsoleAppender #log4j.appender.stdout.Target=System.out #log4j.appender.stdout.layout=org.apache.log4j.PatternLayout #log4j.appender.stdout.layout.ConversionPattern=%d{ABSOLUTE} %5p %c{1}:%L - %m%n ### direct messages to file hibernate.log ### log4j.appender.file=org.apache.log4j.FileAppender log4j.appender.file.File=hibernate.log log4j.appender.file.layout=org.apache.log4j.PatternLayout log4j.appender.file.layout.ConversionPattern=%d{ABSOLUTE} %5p %c{1}:%L - %m%n ### set log levels - for more verbose logging change 'info' to 'debug' ### log4j.rootLogger=warn, stdout #log4j.logger.org.hibernate=info log4j.logger.org.hibernate=debug ### log HQL query parser activity #log4j.logger.org.hibernate.hql.ast.AST=debug ### log just the SQL #log4j.logger.org.hibernate.SQL=debug ### log JDBC bind parameters ### log4j.logger.org.hibernate.type=info #log4j.logger.org.hibernate.type=debug ### log schema export/update ### log4j.logger.org.hibernate.tool.hbm2ddl=debug ### log HQL parse trees #log4j.logger.org.hibernate.hql=debug ### log cache activity ### #log4j.logger.org.hibernate.cache=debug ### log transaction activity #log4j.logger.org.hibernate.transaction=debug ### log JDBC resource acquisition #log4j.logger.org.hibernate.jdbc=debug ### enable the following line if you want to track down connection ### ### leakages when using DriverManagerConnectionProvider ### #log4j.logger.org.hibernate.connection.DriverManagerConnectionProvider=trac5

Read the article
Sudden issues reading uncompressed video using opencv

- by JohnSavage

I have been using a particular pipeline to process video using opencv to encode uncompressed video (fourcc = 0), and opencv python bindings to then open and work on these files. This has been working fine for me on OpenCV 2.3.1a on Ubuntu 11.10 until just a few days ago. For some reason it currently is only allowing me to read the first frame of a given file the first time I open that file. Further frames are not read, and once I touch the file once with my program, it then cannot even read the first frame. More detail: I created the uncompressed video files as follows: out_video.open(out_vid_name, 0, // FOURCC = 0 means record raw fps, Size(640, 480)) Again, these videos worked fine for me until about a week ago. Now, when I try to open one of these I get the following message (from what I think is ffmpeg): Processing video.avi Using network protocols without global network initialization. Please use avformat_network_init(), this will become mandatory later. [avi @ 0x29251e0] parser not found for codec rawvideo, packets or times may be invalid. It reads and displays the first frame fine, but then fails to read the next frame. Then, when I try to run my code on the same video, the capture still opens with the same message as above. However, it cannot even read the very first frame. Here is the code to open the capture: self.capture = cv2.VideoCapture(filename) if not self.capture.isOpened() print "Error: could not open capture" sys.exit() Again, this part is passed without any issue, but then the break happens at: success, rgb = self.capture.read() if not success: print "error: could not read frame" return False This part breaks at the second frame on the first run of the video file, and then on the first frame on subsequent runs. I really don't know where to even begin debugging this. Please help!

Read the article
Opengl-es draw an .obj file, but how?

- by lacas

I d like to parse an .obj file. My parser is working good, but my displaying is not good. Obj file is here my code is: public ObjModelParser parse() { long startTime = System.currentTimeMillis(); InputStream fileIn = resources.openRawResource(resourceID); BufferedReader buffer = new BufferedReader(new InputStreamReader(fileIn)); String line=""; Log.e("model loader", "Start parsing object " + resourceID); try { while ((line = buffer.readLine()) != null) { StringTokenizer parts = new StringTokenizer(line, " "); int numTokens = parts.countTokens(); if (numTokens == 0) continue; String part = parts.nextToken(); if (part.equals(VERTEX)) { Log.e("v ", line); vertices.add(Float.parseFloat(parts.nextToken())); vertices.add(Float.parseFloat(parts.nextToken())); vertices.add(Float.parseFloat(parts.nextToken())); .... and my displaying code is: draw that model with TRIANGLE_STRIP and gl.glDrawArrays(rendermode, 0, coords.length/dimension); What is the mistake here? edited: file here to show what is my good coords from my program for a cube, and what is from .obj file, that never show Thanks, Leslie

Read the article
How to read utf-8 xml from vbs and get correct character code

- by vkjr

I'm trying to read xml file from vbs script. Xml is encoded in utf-8 and has appropriate header From vbs script I use microsoft xmldom parser to read xml: Dim objXMLDoc Set objXMLDoc = CreateObject( "Microsoft.XMLDOM" ) objXMLDoc.load("vbs_strings.xml") Inside xml I'm trying to write character by code using &#nnn; notation. Then I read this character from vbscript and try to get it's code using Asc() function. For some characters it works fine and read code is equal to one written. But for some characters Asc() always returns code 63. What could it be? Examples: If xml contains <section>Ã<section> and in script I have Section variable for representing this xml node then code: Asc(Section.Text) will return value 195 and it's ok. If xml contains <section>n<section> then code: Asc(Section.Text) will return value 110 and it's ok. But if xml contains <section><section> or <section><section> or <section><section> Asc(Section.Text) will return value 63 and it's definitely not good. Do you know why?

Read the article
Better viewing of postfix mail queue files than postcat?

- by Geekman

So I got a call early this morning about a client needing to see what email they have waiting to be delivered sitting in our secondary mail server. Their link for the main server had (still is) been down for two days and they needed to see their email. So I wrote up a quick perl script to use mailq in combination with postcat to dump each email for their address into separate files, tar'd it up and sent it off. Horrible code, I know, but it was urgent. My solution works OK in that it at least gives a raw view, but I thought tonight it would be nice if I had a solution where I could provide their email attachments and maybe remove some "garbage" header text as well. Most of the important emails seem to have a PDF or similar attached. I've been looking around but the only method of viewing queue files I can see is the postcat command, and I really don't want to write my own parser - so I was wondering if any of you have already done so, or know of a better command to use? Here's the code for my current solution: #!/usr/bin/perl $qCmd="mailq | grep -B 2 \"someemailaddress@isp\" | cut -d \" \" -f 1"; @data = split(/\n/, `$qCmd`); $i = 0; foreach $line (@data) { $i++; $remainder = $i % 2; if ($remainder == 0) { next; } if ($line =~ /\(/ || $line =~ /\n/ || $line eq "") { next; } print "Processing: " . $line . "\n"; `postcat -q $line > $line.email.txt`; $subject=`cat $line.email.txt | grep "Subject:"`; #print "SUB" . $subject; #`cat $line.email.txt > \"$subject.$line.email.txt\"`; } Any advice appreciated.

Read the article
Serialization Performance and Google Android

- by Jomanscool2

I'm looking for advice to speed up serialization performance, specifically when using the Google Android. For a project I am working on, I am trying to relay a couple hundred objects from a server to the Android app, and am going through various stages to get the performance I need. First I tried a terrible XML parser that I hacked together using Scanner specifically for this project, and that caused unbelievably slow performance when loading the objects (~5 minutes for a 300KB file). I then moved away from that and made my classes implement Serializable and wrote the ArrayList of objects I had to a file. Reading that file into the objects the Android, with the file already downloaded mind you, was taking ~15-30 seconds for the ~100KB serialized file. I still find this completely unacceptable for an Android app, as my app requires loading the data when starting the application. I have read briefly about Externalizable and how it can increase performance, but I am not sure as to how one implements it with nested classes. Right now, I am trying to store an ArrayList of the following class, with the nested classes below it. public class MealMenu implements Serializable{ private String commonsName; private long startMillis, endMillis, modMillis; private ArrayList<Venue> venues; private String mealName; } And the Venue class: public class Venue implements Serializable{ private String name; private ArrayList<FoodItem> foodItems; } And the FoodItem class: public class FoodItem implements Serializable{ private String name; private boolean vegan; private boolean vegetarian; } IF Externalizable is the way to go to increase performance, is there any information as to how java calls the methods in the objects when you try to write it out? I am not sure if I need to implement it in the parent class, nor how I would go about serializing the nested objects within each object.

Read the article
PRISM - Creating mouseoverbehavior causes a Silverlight library to not be visible in main Silverligh

- by RHLopez

Created a simple Silverlight 4 application (SimpleApp) then added a Silverlight 4 library (LibraryA). Added code to the library (LibraryA) to implement MouseOverBehavior by inheriting from CommandBaseBehavior along with the appropriate attached property class/methods. Added reference in SimpleApp to LibraryA and went to MainPage.xaml to add namespace reference but it does not show up with Intellisense. Typing the namespace manually and then adding the attached MouseOver command works as it should as far as intellisense showing my attached property name, i.e. ... commands:MouseOver.Command="{Binding MousedOver}". However when I try to run it I get a XAML parser error saying that the "Command" attached property does not exist in MouseOver. If I move my class definitions from LibraryA to SimpleApp then everything works. I removed everything from LibraryA and just put one class with this in it: public class MouseOverBehavior : CommandBehaviorBase<Control> { public MouseOverBehavior(Control element) : base(element) {} } With this simple class in LibraryA it will not show up in XAML intellisense in SimpleApp. XAML intellisense works with other libraries that I have written that don't use PRISM. Don't know what I am missing hopefully it's something simple. I am using the latest SL4 build for PRISM change set 42969. Visual Studio 2010 RTM Professional in Windows 7 Ultimate 64-bit.

Read the article
DIY intellisense on XPath - design approach? (WinForms app)

- by Cheeso

I read the DIY Intellisense article on code project, which was referenced from the Mimic Intellisense? question here on SO. I wanna do something similar, DIY intellisense, but for XPath not C#. The design approach used there makes sense to me: maintain a tree of terms, and when the "completion character" is pressed, in the case of C#, a dot, pop up the list of possible completions in a textfield. Then allow the user to select a term from the textfield either through typing, arrow keys, or double-click. How would you apply this to XPath autocompletion? should there be an autocomplete key? In XPath there is no obvious separator key like "dot" in C#. should the popup be triggered explicitly in some other way, let's say ctrl-. ? or should the parser try to autocomplete continuously? If I do the autocomplete continuously, how to scale it properly? There are 93 xpath functions, not counting overloads. I certainly don't want to popup a list of 93 choices. How do I decide when I've narrowed it enough to offer a useful lsit of possible completions? How to populate the tree of possible completions? For C#, it's easy: walk the type space via reflection. At a first level, the "syntax tree" for C# seems like a single tree, and the list of completions at any point depends on the graph of nodes you've traversed to that point. Typing System.Console. traverses to a certain node in that tree, and the list of completions is the set of child nodes available at that node in the tree. On the other hand, the xpath syntax seems like it is a "flatter" tree - function names, axis names, literals. Does this make sense? what have I not considered?

Read the article
Versioned RDF store

- by Mat

Let me try rephrasing this: I am looking for a robust RDF store or library with the following features: Named graphs, or some other form of reification. Version tracking (probably at the named graph level). Privacy between groups of users, either at named graph or triple level. Human-readable data input and output, e.g. TriG parser and serialiser. I've played with Jena, Sesame, Boca, RDFLib, Redland and one or two others some time ago but each had its problems. Have any improved in the above areas recently? Can anything else do what I want, or is RDF not yet ready for prime-time? Reading around the subject a bit more, I've found that: Jena, nothing further Sesame, nothing further Boca does not appear to be maintained any more and seems only really designed for DB2. OpenAnzo, an open-source fork, appears more promising. RDFLib, nothing further Redland, nothing further Talis Platform appears to support changesets (wiki page and reference in Kniblet Tutorial Part 5) but it's a hosted-only service. Still may look into it though. SemVersion sounded promising, but appears to be stale.

Read the article
Execute SQL on CSV files via JDBC

- by Markos Fragkakis

Dear all, I need to apply an SQL query to CSV files (comma-separated text files). My SQL is predefined from another tool, and is not eligible to change. It may contain embedded selects and table aliases in the FROM part. For my task I have found two open-source (this is a project requirement) libraries that provide JDBC drivers: CsvJdbc XlSQL JBoss Teiid Create an Apache Derby DB, load all CSVs as tables and execute the query. These are the problems I encountered: it does not accept the syntax of the SQL (it uses internal selects and table aliases). Furthermore, it has not been maintained since 2004. I could not get it to work, as it has as dependency a SAX Parser that causes exception when parsing other documents. Similarly, no change since 2004. Have not checked if it supports the syntax, but seems like an overhead. It needs several entities defines (Virtual Databases, Bindings). From the mailing list they told me that last release supports runtime creation of required objects. Has anyone used it for such simple task (normally it can connect to several types of data, like CSV, XML or other DBS and create a virtual, unified one)? Can this even be done easily? From the 4 things I considered/tried, only 3 and 4 seem to me viable. Any advice on these, or any other way in which I can query my CSV files? Cheers

Read the article
Registering custom webcontrol inside mvc view?

- by kastermester

I am in the middle of a project where I am migrating some code for a website from WebForms to MVC - unfortunatly there's not enough time to do it all at once, so I will have to do some... not so pretty solutions. I am though facing a problems with a custom control I have written that inherits from the standard GridView control namespace Controls { public class MyGridView : GridView { ... } } I have added to the web.config file as usual: <configuration> ... <system.web> ... <pages> ... <controls> ... <add tagPrefix="Xui" namespace="Controls"/> </controls> </pages> </system.web> </configuration> Then on the MVC View: <Xui:MyGridView ID="GridView1" runat="server" ...>...</Xui:MyGgridView> However I am getting a parser error stating that the control cannot be found. I am suspecting this has to do with the mix up of MVC and WebForms, however I am/was under the impression that such mixup should be possible, is there any kind of tweak for this? I realise this solution is far from ideal, however there's no time to "do the right thing". Thanks

Read the article
Django upload failing on request data read error

- by Jake

Hi All, I've got a Django app that accepts uploads from jQuery uploadify, a jQ plugin that uses flash to upload files and give a progress bar. Files under about 150k work, but bigger files always fail and almost always at around 192k (that's 3 chunks) completed, sometimes at around 160k. The Exception I get is below. exceptions.IOError request data read error File "/usr/lib/python2.4/site-packages/django/core/handlers/wsgi.py", line 171, in _get_post self._load_post_and_files() File "/usr/lib/python2.4/site-packages/django/core/handlers/wsgi.py", line 137, in _load_post_and_files self._post, self._files = self.parse_file_upload(self.META, self.environ[\'wsgi.input\']) File "/usr/lib/python2.4/site-packages/django/http/__init__.py", line 124, in parse_file_upload return parser.parse() File "/usr/lib/python2.4/site-packages/django/http/multipartparser.py", line 192, in parse for chunk in field_stream: File "/usr/lib/python2.4/site-packages/django/http/multipartparser.py", line 314, in next output = self._producer.next() File "/usr/lib/python2.4/site-packages/django/http/multipartparser.py", line 468, in next for bytes in stream: File "/usr/lib/python2.4/site-packages/django/http/multipartparser.py", line 314, in next output = self._producer.next() File "/usr/lib/python2.4/site-packages/django/http/multipartparser.py", line 375, in next data = self.flo.read(self.chunk_size) File "/usr/lib/python2.4/site-packages/django/http/multipartparser.py", line 405, in read return self._file.read(num_bytes) When running locally on the Django development server, big files work. I've tried setting my FILE_UPLOAD_HANDLERS = ("django.core.files.uploadhandler.TemporaryFileUploadHandler",) in case it was the memory upload handler, but it made no difference. Does anyone know how to fix this?

Read the article
Yaml::load_file acting different between development and production (Rails)

- by James

Hi, I am completely stumped at the nature of this problem. We export data from our application into a 'cleaned' YAML file (stripping out IDs, created_at etc). Then we (will) allow users to import these files back into the application - it is the import that is completely bugging me out. In development, YAML::load_file(params[:uploaded_data].local_path) returns an array of YAML::Objects's (and it doesn't matter which of the number of different ways the file could be loaded): [#{"exception_count"="0", "title"="Start", "amount"="70.00", "colour"=nil, "repeat_type_id"="0", "repeat_interval"="1"}}, etc etc] Which is very nice, as the attributes also include the (associated model) exceptions that you see an exception_count for. However on production (rails 2.3.2, running REE 1.8.7 and 1.8.6 for testing, tested on two different production env's, and running production locally) it returns an array of the Objects within the YAML - in this case, Event: [#, repeat_type_id: 0, colour: nil, repeat_interval: 1, exception_count: 0, etc etc] Now this would be just perplexing if it also included the associated model Exception with it - however it doesn't. Can anyone at all shed some light on why the Yaml parser would behave so differently between production and development? I'm on rails 2.3.2, running REE 1.8.7; however I've also tested running Ruby 1.8.6 with exactly the same results. Thanks for any help!

Read the article
Django-South introspection rule doesn't work.

- by Ory Band

I'm using Django 1.2.3 and South 0.7.3. I am trying to convert my app (named core) to use Django-South. I have a custom model/field that I'm using, named ImageWithThumbsField. It's basically just the ol' django.db.models.ImageField with some attributes such as height, weight, etc. While trying to ./manage.py convert_to_auth core I receieve South's freezing errors. I have no idea why, I'm Probably missing something... I am using a simple custom Model: from django.db.models import ImageField class ImageWithThumbsField(ImageField): def __init__(self, verbose_name=None, name=None, width_field=None, height_field=None, sizes=None, **kwargs): self.verbose_name=verbose_name self.name=name self.width_field=width_field self.height_field=height_field self.sizes = sizes super(ImageField, self).__init__(**kwargs) And this is my introspection rule, which I add to the top of my models.py: from south.modelsinspector import add_introspection_rules from lib.thumbs import ImageWithThumbsField add_introspection_rules( [ ( (ImageWithThumbsField, ), [], { "verbose_name": ["verbose_name", {"default": None}], "name": ["name", {"default": None}], "width_field": ["width_field", {"default": None}], "height_field": ["height_field", {"default": None}], "sizes": ["sizes", {"default": None}], }, ), ], ["^core/.fields/.ImageWithThumbsField",]) This is the errors I receieve: ! Cannot freeze field 'core.additionalmaterialphoto.photo' ! (this field has class lib.thumbs.ImageWithThumbsField) ! Cannot freeze field 'core.material.photo' ! (this field has class lib.thumbs.ImageWithThumbsField) ! Cannot freeze field 'core.material.formulaimage' ! (this field has class lib.thumbs.ImageWithThumbsField) ! South cannot introspect some fields; this is probably because they are custom ! fields. If they worked in 0.6 or below, this is because we have removed the ! models parser (it often broke things). ! To fix this, read http://south.aeracode.org/wiki/MyFieldsDontWork Does anybody know why? What am I doing wrong?

Read the article
Slowdowns when reading from an urlconnection's inputstream (even with byte[] and buffers)

- by user342677

Ok so after spending two days trying to figure out the problem, and reading about dizillion articles, i finally decided to man up and ask to for some advice(my first time here). Now to the issue at hand - I am writing a program which will parse api data from a game, namely battle logs. There will be A LOT of entries in the database(20+ million) and so the parsing speed for each battle log page matters quite a bit. The pages to be parsed look like this: http://api.erepublik.com/v1/feeds/battle_logs/10000/0. (see source code if using chrome, it doesnt display the page right). It has 1000 hit entries, followed by a little battle info(lastpage will have <1000 obviously). On average, a page contains 175000 characters, UTF-8 encoding, xml format(v 1.0). Program will run locally on a good PC, memory is virtually unlimited(so that creating byte[250000] is quite ok). The format never changes, which is quite convenient. Now, I started off as usual: //global vars,class declaration skipped public WebObject(String url_string, int connection_timeout, int read_timeout, boolean redirects_allowed, String user_agent) throws java.net.MalformedURLException, java.io.IOException { // Open a URL connection java.net.URL url = new java.net.URL(url_string); java.net.URLConnection uconn = url.openConnection(); if (!(uconn instanceof java.net.HttpURLConnection)) { throw new java.lang.IllegalArgumentException("URL protocol must be HTTP"); } conn = (java.net.HttpURLConnection) uconn; conn.setConnectTimeout(connection_timeout); conn.setReadTimeout(read_timeout); conn.setInstanceFollowRedirects(redirects_allowed); conn.setRequestProperty("User-agent", user_agent); } public void executeConnection() throws IOException { try { is = conn.getInputStream(); //global var l = conn.getContentLength(); //global var } catch (Exception e) { //handling code skipped } } //getContentStream and getLength methods which just return'is' and 'l' are skipped Here is where the fun part began. I ran some profiling (using System.currentTimeMillis()) to find out what takes long ,and what doesnt. The call to this method takes only 200ms on avg public InputStream getWebPageAsStream(int battle_id, int page) throws Exception { String url = "http://api.erepublik.com/v1/feeds/battle_logs/" + battle_id + "/" + page; WebObject wobj = new WebObject(url, 10000, 10000, true, "Mozilla/5.0 " + "(Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 ( .NET CLR 3.5.30729)"); wobj.executeConnection(); l = wobj.getContentLength(); // global variable return wobj.getContentStream(); //returns 'is' stream } 200ms is quite expected from a network operation, and i am fine with it. BUT when i parse the inputStream in any way(read it into string/use java XML parser/read it into another ByteArrayStream) the process takes over 1000ms! for example, this code takes 1000ms IF i pass the stream i got('is') above from getContentStream() directly to this method: public static Document convertToXML(InputStream is) throws ParserConfigurationException, IOException, SAXException { DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db = dbf.newDocumentBuilder(); Document doc = db.parse(is); doc.getDocumentElement().normalize(); return doc; } this code too, takes around 920ms IF the initial InputStream 'is' is passed in(dont read into the code itself - it just extracts the data i need by directly counting the characters, which can be done thanks to the rigid api feed format): public static parsedBattlePage convertBattleToXMLWithoutDOM(InputStream is) throws IOException { // Point A BufferedReader br = new BufferedReader(new InputStreamReader(is)); LinkedList ll = new LinkedList(); String str = br.readLine(); while (str != null) { ll.add(str); str = br.readLine(); } if (((String) ll.get(1)).indexOf("error") != -1) { return new parsedBattlePage(null, null, true, -1); } //Point B Iterator it = ll.iterator(); it.next(); it.next(); it.next(); it.next(); String[][] hits_arr = new String[1000][4]; String t_str = (String) it.next(); String tmp = null; int j = 0; for (int i = 0; t_str.indexOf("time") != -1; i++) { hits_arr[i][0] = t_str.substring(12, t_str.length() - 11); tmp = (String) it.next(); hits_arr[i][1] = tmp.substring(14, tmp.length() - 9); tmp = (String) it.next(); hits_arr[i][2] = tmp.substring(15, tmp.length() - 10); tmp = (String) it.next(); hits_arr[i][3] = tmp.substring(18, tmp.length() - 13); it.next(); it.next(); t_str = (String) it.next(); j++; } String[] b_info_arr = new String[9]; int[] space_nums = {13, 10, 13, 11, 11, 12, 5, 10, 13}; for (int i = 0; i < space_nums.length; i++) { tmp = (String) it.next(); b_info_arr[i] = tmp.substring(space_nums[i] + 4, tmp.length() - space_nums[i] - 1); } //Point C return new parsedBattlePage(hits_arr, b_info_arr, false, j); } I have tried replacing the default BufferedReader with BufferedReader br = new BufferedReader(new InputStreamReader(is), 250000); This didnt change much. My second try was to replace the code between A and B with: Iterator it = IOUtils.lineIterator(is, "UTF-8"); Same result, except this time A-B was 0ms, and B-C was 1000ms, so then every call to it.next() must have been consuming some significant time.(IOUtils is from apache-commons-io library). And here is the culprit - the time taken to parse the stream to string, be it by an iterator or BufferedReader in ALL cases was about 1000ms, while the rest of the code took 0ms(e.g. irrelevant). This means that parsing the stream to LinkedList, or iterating over it, for some reason was eating up a lot of my system resources. question was - why? Is it just the way java is made...no...thats just stupid, so I did another experiment. In my main method I added after the getWebPageAsStream(): //Point A ba = new byte[l]; // 'l' comes from wobj.getContentLength above bytesRead = is.read(ba); //'is' is our URLConnection original InputStream offset = bytesRead; while (bytesRead != -1) { bytesRead = is.read(ba, offset - 1, l - offset); offset += bytesRead; } //Point B InputStream is2 = new ByteArrayInputStream(ba); //Now just working with 'is2' - the "copied" stream The InputStream-byte[] conversion took again 1000ms - this is the way many ppl suggested to read an InputStream, and stil it is slow. And guess what - the 2 parser methods above (convertToXML() and convertBattlePagetoXMLWithoutDOM(), when passed 'is2' instead of 'is' took, in all 4 cases, under 50ms to complete. I read a suggestion that the stream waits for connection to close before unblocking, so i tried using HttpComponentsClient 4.0 (http://hc.apache.org/httpcomponents-client/index.html) instead, but the initial InputStream took just as long to parse. e.g. this code: public InputStream getWebPageAsStream2(int battle_id, int page) throws Exception { String url = "http://api.erepublik.com/v1/feeds/battle_logs/" + battle_id + "/" + page; HttpClient httpclient = new DefaultHttpClient(); HttpGet httpget = new HttpGet(url); HttpParams p = new BasicHttpParams(); HttpConnectionParams.setSocketBufferSize(p, 250000); HttpConnectionParams.setStaleCheckingEnabled(p, false); HttpConnectionParams.setConnectionTimeout(p, 5000); httpget.setParams(p); HttpResponse response = httpclient.execute(httpget); HttpEntity entity = response.getEntity(); l = (int) entity.getContentLength(); return entity.getContent(); } took even longer to process(50ms more for just the network) and the stream parsing times remained the same. Obviously it can be instantiated so as to not create HttpClient and properties every time(faster network time), but the stream issue wont be affected by that. So we come to the center problem - why does the initial URLConnection InputStream(or HttpClient InputStream) take so long to process, while any stream of same size and content created locally is orders of magnitude faster? I mean, the initial response is already somewhere in RAM, and I cant see any good reasong why it is processed so slowly compared to when a same stream is just created from a byte[]. Considering I have to parse million of entries and thousands of pages like that, a total processing time of almost 1.5s/page seems WAY WAY too long. Any ideas? P.S. Please ask in any more code is required - the only thing I do after parsing is make a PreparedStatement and put the entries into JavaDB in packs of 1000+, and the perfomance is ok ~ 200ms/1000entries, prb could be optimized with more cache but I didnt look into it much.

Read the article
problem parsing with XMLReader (using ReadSubTree)

- by no9

Hello. Im trying to build a simple XML to Controls parser in my CF application. In the code below the string im trying to parse looks like this: "<Panel><Label>Text1</Label><Label>Text2</Label></Panel>" The result i want with this code would be a Panel with two labels. But the problem is when the first Label is parsed the subreader.Read() returns false in the ParsePanelElementh method, and so it falls out of while statement. Since im new into XMLReader i must be missing something very simple. Any help would be apreciated ! peace. static class XMLParser { public static Control Parse(string aXmlString) { XmlReader reader = XmlReader.Create(new StringReader(aXmlString)); return ParseXML(reader); } public static Control ParseXML(XmlReader reader) { using (reader) { while (reader.Read()) { if (reader.NodeType == XmlNodeType.Element) { if (reader.LocalName == "Panel") { return ParsePanelElement(reader); } if (reader.LocalName == "Label") { return ParseLabelElement(reader); } } } } return null; } private static Control ParsePanelElement(XmlReader reader) { var myPanel = new Panel(); XmlReader subReader = reader.ReadSubtree(); while (subReader.Read()) { Control subControl = ParseXML(subReader); if (subControl != null) { myPanel.Controls.Add(subControl); }; } return myPanel; } private static Control ParseLabelElement(XmlReader reader) { reader.Read(); var myString = reader.Value as string; var myLabel = new Label(); myLabel.Text = myString; return myLabel; } }

Read the article
Parsing multibyte string in PHP

- by Petr Peller

I would like to write a (HTML) parser based on state machine but I have doubts how to acctually read/use an input. I decided to load the whole input into one string and then work with it as with an array and hold its index as current parsing position. There would be no problems with single-byte encoding, but in multi-byte encoding each value does not represent a character, but a byte of a character. Example: $mb_string = 'žšcr'; //4 multi-byte characters in UTF-8 for($i=0; $i < 4; $i++) { echo $mb_string[$i], PHP_EOL; } Outputs: L ž L A This means I cannot iterate through the string in a loop to check single characters, because I never know if I am in the middle of an character or not. So the questions are: How do I multi-byte safe read a single character from a string in a performance friendly way? Is it good idea to work with the string as it was an array in this case? How would you read the input?

Read the article
How do I get an overview and a methodology for programming in Python

- by Peter Nielsen

I've started to learn Python and programming from scratch. I have not programmed before so it's a new experience. I do seem to grasp most of the concepts, from variables to definitions and modules. I still need to learn a lot more about what the different libraries and modules do and also I lack knowledge on OOP and classes in Python. I see people who just program in Python like that's all they have ever done and I am still just coming to grips with it. Is there a way, some tools, a logical methodology that would give me an overview or a good hold of how to handle programming problems ? For instance, I'm trying to create a parser which we need at the office . I also need to create a spider that would collect links from various websites. Is there a formidable way of studying the various modules to see what is needed ? Or is it just nose to the grind stone and understand what the documentation says ? Sorry for the lengthy question..

Read the article
Java remove HTML from String without regular expressions

- by behrk2

Hello, I am trying to remove all HTML elements from a String. Unfortunately, I cannot use regular expressions because I am developing on the Blackberry platform and regular expressions are not yet supported. Is there any other way that I can remove HTML from a string? I read somewhere that you can use a DOM Parser, but I couldn't find much on it. Text with HTML: <![CDATA[As a massive asteroid hurtles toward Earth, NASA head honcho Dan Truman (<a href="http://www.netflix.com/RoleDisplay/Billy_Bob_Thornton/20000303">Billy Bob Thornton</a>) hatches a plan to split the deadly rock in two before it annihilates the entire planet, calling on Harry Stamper (<a href="http://www.netflix.com/RoleDisplay/Bruce_Willis/99786">Bruce Willis</a>) -- the world's finest oil driller -- to head up the mission. With time rapidly running out, Stamper assembles a crack team and blasts off into space to attempt the treacherous task. <a href="http://www.netflix.com/RoleDisplay/Ben_Affleck/20000016">Ben Affleck</a> and <a href="http://www.netflix.com/RoleDisplay/Liv_Tyler/162745">Liv Tyler</a> co-star.]]> Text without HTML: As a massive asteroid hurtles toward Earth, NASA head honcho Dan Truman (Billy Bob Thornton) hatches a plan to split the deadly rock in two before it annihilates the entire planet, calling on Harry Stamper (Bruce Willis) -- the world's finest oil driller -- to head up the mission. With time rapidly running out, Stamper assembles a crack team and blasts off into space to attempt the treacherous task.Ben Affleck and Liv Tyler co-star. Thanks!

Read the article
QueryReadStore loads JSON into DataGrid, but JsonRestStore does not (from the same source)

- by labratmatt

I'm building a Dojo DataGrid from JSON data provided by my REST interface. The DataGrid loads the data fine using a QueryReadStore, but doesn't seem to work with the same same data piped into a JsonRestStore. I'm using the following Dojo libs with Dojo 1.4.1: dojo.require("dojox.data.JsonRestStore"); dojo.require("dojox.grid.DataGrid"); dojo.require("dojox.data.QueryReadStore"); dojo.require("dojo.parser"); I declare my stores in the following manner: var storeJRS = new dojox.data.JsonRestStore({target:"api/collaborations.php/1", idAttribute: 'items[].id'}); var storeQRS = new dojox.data.QueryReadStore({url:"api/collaborations.php/1", requestMethod:"get"}); I create my grid layout like this: var gridLayout = [ new dojox.grid.cells.RowIndex({ name: "Row #", width: 5, styles: "text-align: left;" }), { name: "Name", field: "name", styles: "text-align:right;", width:20 }, { name: "Description", field: "description", width:30 } ]; I create my DataGrid as follows: The above works, but if I use QueryReadStore as my store, the grid is created with the headers (Name, Description), but it isn't populated with any rows: <div dojoType="dojox.grid.DataGrid" jsid="grid3" store="storeQRS" structure="gridLayout" style="height:500px; width:1000px;"></div> Using FireBug, I can see that QueryReadStore is getting my JSON data from my REST interface. It looks like the following: {"numRows":6,"items":[{"name":"My Super Cool Collab","description":"This is for all the super cool people in the super cool group","id":1},{"name":"My Other Super Cool","description":"This is for all the other super cool people","id":3},{"name":"This is another coll","description":"This is just some other collab","id":4},{"name":"some new collab","description":"this is a new collab","id":5},{"name":"yet another new coll","description":"uh huh","id":6},{"name":"asdf","description":"asdf","id":7}]} Any ideas? Thanks.

Read the article
Theory of formal languages - Automaton

- by dader51

Hi everybody ! I'm wondering about formal languages. I have a kind of parser : It reads à xml-like serialized tree structure and turn it into a multidimmensionnal array. I figured out that i need at least three variables to achieve the job : $tree = array(); // a new array $pTree = array(&$tree); // a new array which the first element points to $tree; $deep = 0; plus the one containing the sentence splitted into words. My point is on the similarities between the algorithm deing used and the differents kinds of automatons ( state machines turing machines stack ... ). The $words variable is the "tape" of the automaton, the test/conditions of the algorithm are transitions, $deep is the state and $tree is the output. I cannont figure what is $pTree. So the question is : which is the automaton I implictly use here, and to which formal languages family does it fit ? And what's about recursion ?

Read the article
Extracting email addresses in an html block in ruby/rails

- by corroded

I am creating a parser that wards off against spamming and harvesting of emails from a block of text that comes from tinyMCE (so it may or may not have html tags in it) I've tried regexes and so far this has been successful: /\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b/i problem is, i need to ignore all email addresses with mailto hrefs. for example: <a href="mailto:[email protected]">[email protected]</a> should only return the second email add. To get a background of what im doing, im reversing the email addresses in a block so the above example would look like this: <a href="mailto:[email protected]">moc.liam@tset</a> problem with my current regex is that it also replaces the one in href. Is there a way for me to do this with a single regex? Or do i have to check for one then the other? Is there a way for me to do this just by using gsub or do I have to use some nokogiri/hpricot magicks and whatnot to parse the mailtos? Thanks in advance! Here were my references btw: so.com/questions/504860/extract-email-addresses-from-a-block-of-text so.com/questions/1376149/regexp-for-extracting-a-mailto-address im also testing using this: http://rubular.com/ edit here's my current helper code: def email_obfuscator(text) text.gsub(/\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b/i) { |m| m = "<span class='anti-spam'>#{m.reverse}</span>" } end which results in this: <a target="_self" href="mailto:<span class='anti-spam'>moc.liamg@tset</span>"><span class="anti-spam">moc.liamg@tset</span></a>

Read the article
What libraries will parse a DTD using PHP

- by Chadwick

I need to parse DTDs using PHP and am hoping there's a simple library to help out. Each DTD has numerous <!ENTITY... and <!-- Comment... elements, which I need to act upon. Note that I do not need to validate anything against these DTDs, simply parse them as data files themselves. A few options I've looked at: James Clarke's SD, which is an option of last resort, but I'd like to avoid the complexity of building/installing/configuring code external to PHP. I'm not sure it's even possible in my situation. PEAR has an XML_DTD_Parser, which requires installing/configuring PEAR and a number of pear modules, which I'm also not sure is possible, and would rather avoid. Has anyone used it with success? PHP XML Classes has the class_path_parser, which another site suggested, but it fails to read ENTITY elements. It appears to be using PHP's built in XML parsing capabilities, which use EXPAT. PHP's DOMDocument will validate against a DTD, so must be able to read them, though I don't see how to get at the DTD parser directly at first glance.

Read the article

< Previous Page | 55 56 57 58 59 60 61 62 63 64 65 66 | Next Page >