Search Results

Search found 7251 results on 291 pages for 'pdf parsing'.

Page 104/291 | < Previous Page | 100 101 102 103 104 105 106 107 108 109 110 111  | Next Page >

  • Trying to parse xml, but xmldocument.loadxml() is trying to download?

    - by maxp
    I have a string input that i do not know whether or not is valid xml. I think the simplest aprroach is to wrap new XmlDocument().LoadXml(strINPUT); In a try/catch. The problem im facing is, sometimes strINPUT is an html file, if the header of this file contains <!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 Transitional//EN"" ""http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd""> <html xml:lang=""en-GB"" xmlns=""http://www.w3.org/1999/xhtml"" lang=""en-GB""> ...like many do, it actually tries to make a connection to the w3.org url, which i really dont want it doing. Anyone know if its possible to just parse the string without trying to be clever and checking external urls? Failing that is there an alternative to xmldocument?

    Read the article

  • How Do I Pull Info from String

    - by Russ Bradberry
    I am trying to pull dynamics from a load that I run using bash. I have gotten to a point where I get the string I want, now from this I want to pull certain information that can vary. The string that gets returned is as follows: Records: 2910 Deleted: 0 Skipped: 0 Warnings: 0 Each of the number can and will vary in length, but the overall structure will remain the same. What I want to do is be able to get these numbers and load them into some bash variables ie: RECORDS=?? DELETED=?? SKIPPED=?? WARNING=?? In regex I would do it like this: Records: (\d*?) Deleted: (\d*?) Skipped (\d*?) Warnings (\d*?) and use the 4 groups in my variables.

    Read the article

  • Parsec Haskell Lists

    - by Martin
    I'm using Text.ParserCombinators.Parsec and Text.XHtml to parse an input and get a HTML output. If my input is: * First item, First level ** First item, Second level ** Second item, Second level * Second item, First level My output should be: <ul><li>First item, First level <ul><li>First item, Second level </li><li>Second item, Second level </li></ul></li><li>Second item, First level</li></ul> I wrote this, but obviously does not work recursively list= do{ s <- many1 item;return (olist << s) } item= do{ (count 1 (char '*')) ;s <- manyTill anyChar newline ;return ( li << s) } Any ideas? the recursion can be more than two levels Thanks!

    Read the article

  • Pros and Cons of Java HTML to XML cleaners

    - by cjavapro
    I am looking to allow HTML emails (and other HTML uploads) without letting in scripts and stuff. I plan to have a white list of safe tags and attributes as well as a whitelist of CSS tags and value regexes (to prevent automatic return receipt). I asked a question: Parse a badly formatted XML document (like an HTML file) I found there are many many ways to do this. Some systems have built in sanitizers (which I don't care so much about). I will post some answers and say Community Wiki. Please post any other options you like and say Community Wiki so they can be voted on. Also any comments or wiki edits on what part of a certain product is better and what is not would be greatly appreciated. This page is a very nice listing page but I get kinda lost http://java-source.net/open-source/html-parsers

    Read the article

  • What's the best way to retrieve two pieces of data from an XML file?

    - by Morinar
    I've got an XML document that is in either a pre or post FO transformed state that I need to extract some information from. In the pre-case, I need to pull out two tags that represent the pageWidth and pageHeight and in the post case I need to extract the page-height and page-width parameters from a specific tag (I forget which one it is off the top of my head). What I'm looking for is an efficient/easily maintainable way to grab these two elements. I'd like to only read the document a single time fetching the two things I need. I initially started writing something that would use BufferedReader + FileReader, but then I'm doing string searching and it gets messy when the tags span multiple lines. I then looked at the DOMParser, which seems like it would be ideal, but I don't want to have to read the entire file into memory if I could help it as the files could potentially be large and the tags I'm looking for will nearly always be close to the top of the file. I then looked into SAXParser, but that seems like a big pile of complicated overkill for what I'm trying to accomplish. Anybody have any advice? Or simple implementations that would accomplish my goal? Thanks.

    Read the article

  • Why does Joda time change the PM in my input string to AM?

    - by Tree
    My input string is a PM time: log(start); // Sunday, January 09, 2011 6:30:00 PM I'm using Joda Time's pattern syntax as follows to parse the DateTime: DateTimeFormatter parser1 = DateTimeFormat.forPattern("EEEE, MMMM dd, yyyy H:mm:ss aa"); DateTime startTime = parser1.parseDateTime(start); So, why is my output string AM? log(parser1.print(startTime)); // Sunday, January 09, 2011 6:30:00 AM

    Read the article

  • parser 2.1 and 2.2

    - by yaniv
    hi i using the follwing Code to retrive XML element text using getElementsByTagName this code success in 2.2 and Failed in 2.1 any idea ? URL metafeedUrl = new URL("http://x..../Y.xml") URLConnection connection ; connection= metafeedUrl.openConnection(); HttpURLConnection httpConnection = (HttpURLConnection)connection ; int resposnseCode= httpConnection.getResponseCode() ; if (resposnseCode == HttpURLConnection.HTTP_OK) { InputStream in = httpConnection.getInputStream(); DocumentBuilderFactory dbf ; dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db = dbf.newDocumentBuilder(); // Parse the Earthquakes entry Document dom = db.parse(in); Element docEle = dom.getDocumentElement(); //ArrayList<Album> Albums = new ArrayList<Album>(); /* Returns a NodeList of all descendant Elements with a given tag name, in document order.*/ NodeList nl = docEle.getElementsByTagName("entry"); if (nl!=null && nl.getLength() > 0) { for (int i = 0; i < nl.getLength(); i++) { Element entry = (Element)nl.item(i); /* Now on every property in Entry **/ Element title =(Element)entry.getElementsByTagName("title").item(0); *Here i Get an Error* String album_Title = title.getTextContent(); Element id =(Element)entry.getElementsByTagName("id").item(0); String album_id = id.getTextContent(); //

    Read the article

  • How can I parse a namespace using the SAX parser?

    - by Silvestri
    Hello, Using a twitter search URL ie. http://search.twitter.com/search.rss?q=android returns CSS that has an item that looks like: <item> <title>@UberTwiter still waiting for @ubertwitter android app!!!</title> <link>http://twitter.com/meals69/statuses/21158076391</link> <description>still waiting for an app!!!</description> <pubDate>Sat, 14 Aug 2010 15:33:44 +0000</pubDate> <guid>http://twitter.com/meals69/statuses/21158076391</guid> <author>Some Twitter User</author> <media:content type="image/jpg" height="48" width="48" url="http://a1.twimg.com/profile_images/756343289/me2_normal.jpg"/> <google:image_link>http://a1.twimg.com/profile_images/756343289/me2_normal.jpg</google:image_link> <twitter:metadata> <twitter:result_type>recent</twitter:result_type> </twitter:metadata> </item> Pretty simple. My code parses out everything (title, link, description, pubDate, etc.) without any problems. However, I'm getting null on: <google:image_link> I'm using Java to parse the RSS feed. Do I have to handle compound localnames differently than I would a more simple localname? This is the bit of code that parses out Link, Description, pubDate, etc: @Override public void endElement(String uri, String localName, String name) throws SAXException { super.endElement(uri, localName, name); if (this.currentMessage != null){ if (localName.equalsIgnoreCase(TITLE)){ currentMessage.setTitle(builder.toString()); } else if (localName.equalsIgnoreCase(LINK)){ currentMessage.setLink(builder.toString()); } else if (localName.equalsIgnoreCase(DESCRIPTION)){ currentMessage.setDescription(builder.toString()); } else if (localName.equalsIgnoreCase(PUB_DATE)){ currentMessage.setDate(builder.toString()); } else if (localName.equalsIgnoreCase(GUID)){ currentMessage.setGuid(builder.toString()); } else if (uri.equalsIgnoreCase(AVATAR)){ currentMessage.setAvatar(builder.toString()); } else if (localName.equalsIgnoreCase(ITEM)){ messages.add(currentMessage); } builder.setLength(0); } } startDocument looks like: @Override public void startDocument() throws SAXException { super.startDocument(); messages = new ArrayList<Message>(); builder = new StringBuilder(); } startElement looks like: @Override public void startElement(String uri, String localName, String name, Attributes attributes) throws SAXException { super.startElement(uri, localName, name, attributes); if (localName.equalsIgnoreCase(ITEM)){ this.currentMessage = new Message(); } } Tony

    Read the article

  • Removing HTML entities while preserving line breaks with JSoup

    - by shrodes
    I have been using JSoup to parse lyrics and it has been great until now, but have run into a problem. I can use Node.html() to return the full HTML of the desired node, which retains line breaks as such: Gl&oacute;andi augu, silfurn&aacute;tt <br />Bl&oacute;&eth; alv&ouml;ru, starir &aacute; <br />&Oacute;&eth;ur hundur er &iacute; v&iacute;gam&oacute;&eth;, &iacute; maga... m&eacute;r <br /> <br />Kolni&eth;ur gref, kvik sem dreg h&eacute;r <br />Kolni&eth;ur svart, hvergi bjart n&eacute; But has the unfortunate side-effect, as you can see, of retaining HTML entities and tags. However, if I use Node.text(), I can get a better looking result, free of tags and entities: Glóandi augu, silfurnátt Blóð alvöru, starir á Óður hundur er í vígamóð, í maga... mér Kolniður gref, kvik sem dreg hér Kolniður svart, Which has another unfortunate side-effect of removing the line breaks and compressing into a single line. Simply replacing <br /> from the node before calling Node.text() yields the same result, and it seems that that method is compressing the text onto a single line in the method itself, ignoring newlines. Is it possible to have the best of both worlds, and have tags and entities replaced correctly which preserving the line breaks, or is there another method or way of decoding entities and removing tags without having to replace them manually?

    Read the article

  • How to retrieve a numbered sequence range from a List of filenames?

    - by glenneroo
    I would like to automatically parse the entire numbered sequence range of a List<FileData> of filenames (sans extensions) by checking which part of the filename changes. Here is an example (file extension already removed): First filename: IMG_0000 Last filename: IMG_1000 Numbered Range I need: 0000 to 1000 Except I need to deal with every possible type of file naming convention such as: 0000 ... 9999 20080312_0000 ... 20080312_9999 IMG_0000 - Copy ... IMG_9999 - Copy 8er_green3_00001 .. 8er_green3_09999 etc. I need the entire 0-padded range e.g. 0001 not just 1 The sequence number is 0-padded e.g. 0001 The sequence number can be located anywhere e.g. IMG_0000 - Copy The range can start and end with anything i.e. doesn't have to start with 1 and end with 9999 Whenever I get something working for 8 random test cases, the 9th test breaks everything and I end up re-starting from scratch. I've currently been comparing only the first and last filenames (as opposed to iterating through all filenames): void FindRange(List<FileData> files, out string startRange, out string endRange) { string firstFile = files.First().ShortName; string lastFile = files.Last().ShortName; ... } Does anyone have any clever ideas?

    Read the article

  • FileHelpers cannot map converted field into destination array

    - by jaffa
    I have the following record (reduced for brevity): [DelimitedRecord(",")] [IgnoreFirst] [IgnoreEmptyLines()] public class ImportRecord { [FieldQuoted] [FieldTrim(TrimMode.Both)] public string FirstName; [FieldQuoted] [FieldTrim(TrimMode.Both)] public string LastName; [FieldQuoted] [FieldTrim(TrimMode.Both)] [FieldOptional] [FieldConverter(typeof(TestPropertyConverter))] public int[] TestProperty; } Converter code: public class TestPropertyConverter : ConverterBase { public override object StringToField(string from) { var ret = from.Split('|').Select(x => Convert.ToInt32(x)).ToArray(); return ret; } } So an example record could be: John, Smith, 1|2|3|4 It would expect the values 1,2,3,4 to expand and fill the TestProperty array. However, I'm getting the following exception: At least one element in the source array could not be cast down to the destination array type. I've tried to debug into the code and it seems to blow-up in the ExtractFieldValue() function inside FieldBase.cs where it tries to return out of the function. The following line seems to be the culprit: res.ToArray(ArrayType); It seems to expect the 'res' variable to be the destination type array, but it contains 1 element of the array itself. Can anyone suggest if I'm doing this wrong or a possible fix?

    Read the article

  • Get HTML element informations in .NET

    - by martin.malek
    Hi, I'm just thinking if there is any way how to get information about element in HTML in my .NET application. The input is HTML page and path to CSS files etc. I want to take e.g. H1 tag and found what will be the CSS for it. Is there any code or can I use IE and try to take this information from it automatically inside of my application?

    Read the article

  • Android Alert Dialog Extract User Choice Radio Button?

    - by kel196
    Apologies for any coding ignorance, I am a beginner and am still learning! I am creating a quiz app which reads questions off a SQL database and plots them as points on a google map. As the user clicks each point, an alert dialog appears with radio buttons to answer some question. My radiobuttons are in CustomItemizedOverlay file and when I click submit, I want to be able to send the user choice back to the database, save it and return to the user if their answer is correct. My question is this, how do I pass the user choice out once the submit button has been clicked? I tried making a global variable to read what was selected to no avail. Any suggestions would be appreciated! If I need to post any other code, please ask and I will do so ASAP! package uk.ac.ucl.cege.cegeg077.uceskkw; import java.util.ArrayList; import android.app.AlertDialog; import android.content.Context; import android.content.DialogInterface; import android.graphics.drawable.Drawable; import android.widget.Toast; import com.google.android.maps.ItemizedOverlay; import com.google.android.maps.OverlayItem; public class CustomItemizedOverlay2 extends ItemizedOverlay<OverlayItem> { private ArrayList<OverlayItem> mapOverlays = new ArrayList<OverlayItem>(); private Context context; public CustomItemizedOverlay2(Drawable defaultMarker) { super(boundCenterBottom(defaultMarker)); } public CustomItemizedOverlay2(Drawable defaultMarker, Context context) { this(defaultMarker); this.context = context; } @Override protected OverlayItem createItem(int i) { return mapOverlays.get(i); } @Override public int size() { return mapOverlays.size(); } @Override protected boolean onTap(int index) { OverlayItem item = mapOverlays.get(index); // gets the snippet from ParseKMLAndDisplayOnMap and splits it back into // a string. final CharSequence allanswers[] = item.getSnippet().split("::"); AlertDialog.Builder dialog = new AlertDialog.Builder(context); dialog.setIcon(R.drawable.easterbunnyegg); dialog.setTitle(item.getTitle()); dialog.setSingleChoiceItems(allanswers, -1, new DialogInterface.OnClickListener() { public void onClick(DialogInterface dialog, int whichButton) { Toast.makeText(context, "You have selected " + allanswers[whichButton], Toast.LENGTH_SHORT).show(); } }); dialog.setPositiveButton(R.string.button_submit, new DialogInterface.OnClickListener() { public void onClick(DialogInterface dialog, int whichButton) { dialog.dismiss(); // int selectedPosition = ((AlertDialog) // dialog).getListView().getCheckedItemPosition(); Toast.makeText(context, "hi!", Toast.LENGTH_SHORT) .show(); } }); dialog.setNegativeButton(R.string.button_close, new DialogInterface.OnClickListener() { public void onClick(DialogInterface dialog, int whichButton) { dialog.dismiss(); // on cancel button action } }); AlertDialog question = dialog.create(); question.show(); return true; } public void addOverlay(OverlayItem overlay) { mapOverlays.add(overlay); this.populate(); } }

    Read the article

  • Parse lines of integers in C

    - by Jérôme
    This is a classical problem, but I can not find a simple solution. I have an input file like: 1 3 9 13 23 25 34 36 38 40 52 54 59 2 3 9 14 23 26 34 36 39 40 52 55 59 63 67 76 85 86 90 93 99 108 114 2 4 9 15 23 27 34 36 63 67 76 85 86 90 93 99 108 115 1 25 34 36 38 41 52 54 59 63 67 76 85 86 90 93 98 107 113 2 3 9 16 24 28 2 3 10 14 23 26 34 36 39 41 52 55 59 63 67 76 Lines of different number of integers separated by a space. I would like to parse them in an array, and separate each line with a marker, let say -1. The difficulty is that I must handle integers and line returns. Here my existing code, it loops upon the scanf loop (because scanf can not begin at a given position). #include <stdio.h> #include <stdlib.h> int main(int argc, char **argv) { if (argc != 4) { fprintf(stderr, "Usage: %s <data file> <nb transactions> <nb items>\n", argv[0]); return 1; } FILE * file; file = fopen (argv[1],"r"); if (file==NULL) { fprintf(stderr, "Error: can not open %s\n", argv[1]); fclose(file); return 1; } int nb_trans = atoi(argv[2]); int nb_items = atoi(argv[3]); int *bdd = malloc(sizeof(int) * (nb_trans + nb_items)); char line[1024]; int i = 0; while ( fgets(line, 1024, file) ) { int item; while ( sscanf (line, "%d ", &item )){ printf("%s %d %d\n", line, i, item); bdd[i++] = item; } bdd[i++] = -1; } for ( i = 0; i < nb_trans + nb_items; i++ ) { printf("%d ", bdd[i]); } printf("\n"); }

    Read the article

  • PHP: What is an efficient way to parse a text file containing very long lines?

    - by Shaun
    I'm working on a parser in php which is designed to extract MySQL records out of a text file. A particular line might begin with a string corresponding to which table the records (rows) need to be inserted into, followed by the records themselves. The records are delimited by a backslash and the fields (columns) are separated by commas. For the sake of simplicity, let's assume that we have a table representing people in our database, with fields being First Name, Last Name, and Occupation. Thus, one line of the file might be as follows [People] = "\Han,Solo,Smuggler\Luke,Skywalker,Jedi..." Where the ellipses (...) could be additional people. One straightforward approach might be to use fgets() to extract a line from the file, and use preg_match() to extract the table name, records, and fields from that line. However, let's suppose that we have an awful lot of Star Wars characters to track. So many, in fact, that this line ends up being 200,000+ characters/bytes long. In such a case, taking the above approach to extract the database information seems a bit inefficient. You have to first read hundreds of thousands of characters into memory, then read back over those same characters to find regex matches. Is there a way, similar to the Java String next(String pattern) method of the Scanner class constructed using a file, that allows you to match patterns in-line while scanning through the file? The idea is that you don't have to scan through the same text twice (to read it from the file into a string, and then to match patterns) or store the text redundantly in memory (in both the file line string and the matched patterns). Would this even yield a significant increase in performance? It's hard to tell exactly what PHP or Java are doing behind the scenes.

    Read the article

  • Parse string to create a list of element

    - by Nick
    I have a string like this: "\r color=\"red\" name=\"Jon\" \t\n depth=\"8.26\" " And I want to parse this string and create a std::list of this object: class data { std::string name; std::string value; }; Where for example: name = color value = red What is the fastest way? I can use boost. EDIT: This is what i've tried: vector<string> tokens; split(tokens, str, is_any_of(" \t\f\v\n\r")); if(tokens.size() > 1) { list<data> attr; for_each(tokens.begin(), tokens.end(), [&attr](const string& token) { if(token.empty() || !contains(token, "=")) return; vector<string> tokens; split(tokens, token, is_any_of("=")); erase_all(tokens[1], "\""); attr.push_back(data(tokens[0], tokens[1])); } ); } But it does not work if there are spaces inside " ": like color="red 1".

    Read the article

  • Is XMLReader a SAX parser, a DOM parser, or neither?

    - by Renesis
    I am testing various methods to read (possibly large, and very often) XML configuration files in PHP. No writing is ever needed. I have two successful implementations, one using SimpleXML (which I know is a DOM parser) and one using XMLReader. I know that a DOM reader must read the whole tree and therefore uses more memory. My tests reflect that. I also know that A SAX parser is an "event-based" parser that uses less memory because it reads each node from the stream without checking what is next. XMLReader also reads from a stream with the cursor providing data about the node it is currently at. So, it definitely sounds like XMLReader (http://us2.php.net/xmlreader) is not a DOM parser, but my question is, is it a SAX parser, or something else? It seems like XMLReader behaves the way a SAX parser does but does not throw the events themselves (in other words, can you construct a SAX parser with XMLReader?) If it is something else, does the classification it's in have a name?

    Read the article

  • Split with token info

    - by boomhauer
    I would like to split a string using multiple chars to split upon. For example, consider spin text format: This is a {long|ugly|example} string I would want to parse this string and split it on the "{", "|", and "}" chars myString.Split('|','{','}') Now I have tokens to play with, but what I would like is to retain the info about which char was used to split each piece of the array that is returned. Any existing code that can do something like this?

    Read the article

  • C# - Parse HTML source as XML

    - by fonix232
    I would like to read in a dynamic URL what contains a HTML file, and read it like an XML file, based on nodes (HTML tags). Is this somehow possible? I mean, there is this HTML code: <table class="bidders" cellpadding="0" cellspacing="0"> <tr class="bidRow4"> <td>kucik (automata)</td> <td class="right">9 374 Ft</td> <td class="bidders_date">2010-06-10 18:19:52</td> </tr> <tr class="bidRow4"> <td>macszaf (automata)</td> <td class="right">9 373 Ft</td> <td class="bidders_date">2010-06-10 18:19:52</td> </tr> <tr class="bidRow2"> <td>kucik (automata)</td> <td class="right">9 372 Ft</td> <td class="bidders_date">2010-06-10 18:19:42</td> </tr> <tr class="bidRow2"> <td>macszaf (automata)</td> <td class="right">9 371 Ft</td> <td class="bidders_date">2010-06-10 18:19:42</td> </tr> <tr class="bidRow0"> <td>kucik (automata)</td> <td class="right">9 370 Ft</td> <td class="bidders_date">2010-06-10 18:19:32</td> </tr> <tr class="bidRow0"> <td>macszaf (automata)</td> <td class="right">9 369 Ft</td> <td class="bidders_date">2010-06-10 18:19:32</td> </tr> <tr class="bidRow8"> <td>kucik (automata)</td> <td class="right">9 368 Ft</td> <td class="bidders_date">2010-06-10 18:19:22</td> </tr> <tr class="bidRow8"> <td>macszaf (automata)</td> <td class="right">9 367 Ft</td> <td class="bidders_date">2010-06-10 18:19:22</td> </tr> <tr class="bidRow6"> <td>kucik (automata)</td> <td class="right">9 366 Ft</td> <td class="bidders_date">2010-06-10 18:19:12</td> </tr> <tr class="bidRow6"> <td>macszaf (automata)</td> <td class="right">9 365 Ft</td> <td class="bidders_date">2010-06-10 18:19:12</td> </tr> </table> I want to parse this into a ListView (or a Grid) to create rows with the data contained. All tr are different row, and all td in a given td is a column in the given row. And also I want it to be as fast as possible, as it would update itself in 5 seconds. Is there any library for this?

    Read the article

  • PHP or C# script to parse CSV table values to fill in one-to-many table

    - by Yaaqov
    I'm looking for an example of how to split-out comma-delimited data in a field of one table, and fill in a second table with those individual elements, in order to make a one-to-many relational database schema. This is probably really simple, but let me give an example: I'll start with everything in one table, Widgets, which has a "state" field to contain states that have that widget: Table: WIDGET =============================== | id | unit | states | =============================== |1 | abc | AL,AK,CA | ------------------------------- |2 | lmn | VA,NC,SC,GA,FL | ------------------------------- |3 | xyz | KY | =============================== Now, what I'd like to create via code is a second table to be joined to WIDGET called *Widget_ST* that has widget id, widget state id, and widget state name fields, for example Table: WIDGET_ST ============================== | w_id | w_st_id | w_st_name | ------------------------------ |1 | 1 | AL | |1 | 2 | AK | |1 | 3 | CA | |2 | 1 | VA | |2 | 2 | NC | |2 | 1 | SC | |2 | 2 | GA | |2 | 1 | FL | |3 | 1 | KY | ============================== I am learning C# and PHP, so responses in either language would be great. Thanks.

    Read the article

  • Extract information from javascript counter via PHP

    - by Jennifer Weinberg
    Hi, I'm looking for a way to extract some information from this site via PHP: http://www.mycitydeal.co.uk/deals/london There ist a counter where the time left is displayed, but the information is within the JavaScript. Since I'm really a JavaScript rookie, I didn't really know how to get the information. Normally I would extract the information with "preg_match" and some regular expressions. Can someone help me to extract the information (Hrs., Min., Sec.) ? Jennifer

    Read the article

< Previous Page | 100 101 102 103 104 105 106 107 108 109 110 111  | Next Page >