Search Results

Search found 4222 results on 169 pages for 'dtd parsing'.

Page 2/169 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • Prevent DTD download when parsing XML, redux

    - by harpo
    I have a folder full of XHTML files, and a console program to process them. The problem is, it doesn't work when I'm offline, because The remote name could not be resolved: 'www.w3.org' A number of sites, including this one, say to set the XmlResolver to null. That is not not working. I have tried setting the XmlResolver to null every way that I can think of: in the XmlDocument, in XmlTextReader, and in the XmlReaderSettings. I believe that it's just creating a default instance. I have also tried implementing my own XmlResolver, but I don't know what to return for external items. public class NonXmlResolver : XmlUrlResolver { public override object GetEntity( Uri absoluteUri, string role, Type ofObjectToReturn ) { if ( absoluteUri.Scheme == "http" ) throw new Exception("What you talking 'bout, Willis?"); else return base.GetEntity( absoluteUri, role, ofObjectToReturn ); } } The documents do not use any special entities, only pure XML. This is possible, right?

    Read the article

  • basic question with a dtd

    - by voodoomsr
    i need an element with the characteristics <!ELEMENT section ((comment*)|definition|(comment*))> but that is ambiguous, i get the next message in visual studio Multiple definition of element 'comment' causes the content model to become ambiguous. A content model must be formed such that during validation of an element information item sequence, the particle contained directly, indirectly or implicitly therein with which to attempt to validate each item in the sequence in turn can be uniquely determined without examining the content or attributes of that item, and without any information about the items in the remainder of the sequence. So how can i write correctly that? the correct structure is one definition surrounded by possibles comments elements.

    Read the article

  • What is the use of DTD in HTML

    - by shoaibmohammed
    Hello, Could anyone specify what is the need and use of the Document Type Definition in HTML pages. What is the advantages of it? I searched through the Net and found the results to be little bit confusing. Please someone highlight

    Read the article

  • DTD is prohibited in this XML document -- how to change permissions?

    - by frankadelic
    I am using a 3rd-party .NET component which requires an XML configuration file. I'm am using this in an ASP.NET application. I get an error when configure the XML with the following dtd: <!DOCTYPE prod-config SYSTEM "prod-config.dtd"> The error is as follows: For security reasons DTD is prohibited in this XML document. To enable DTD processing set the ProhibitDtd property on XmlReaderSettings to false and pass the settings into XmlReader.Create method. prod-config.dtd is sitting in the same directory as the XML config file. I don't have access to the component code to modify XmlReaderSettings, ProhibitDtd etc. Is there anotherway I can modify or tag the XML file to permit the DTD to be accessed? (FYI, the component is Oracle Coherence .NET client)

    Read the article

  • Parsing HTML with Python 2.7 - HTMLParser, SGMLParser, or Beautiful Soup?

    - by Eric Wilson
    I want to do some screen-scraping with Python 2.7, and I have no context for the differences between HTMLParser, SGMLParser, or Beautiful Soup. Are these all trying to solve the same problem, or do they exist for different reasons? Which is simplest, which is most robust, and which (if any) is the default choice? Also, please let me know if I have overlooked a significant option. Edit: I should mention that I'm not particularly experienced in HTML parsing, and I'm particularly interested in which will get me moving the quickest, with the goal of parsing HTML on one particular site.

    Read the article

  • Hot to fix nautilus desktop on linux mint

    - by user59530
    so I'm using Linux Mint 13 with Cinnamon and suddenly there are no icons on the desktop and the right click doesn't work, it's like the desktop doesn't start up at all, but the Cinnamon interface and everything else are working just fine. This happens only when I open the session with Cinnamon, if I start the session on the classic Gnome or MATE the desktop works. I tried to re-install Cinnamon but nothing changed. Then, I noticed that there are some little problems in Nautilus (sometimes menus aren't the color they're supposed to be), so I'm convinced that Nautilus might be the problem, but I don't know how to fix this, I've tried a few thing but I'm starting to fear that I'm only making it worse. Also, when I open the terminal and type in nautilus here's what's shows up, any help? (nautilus:2906): Gtk-WARNING **: Theme parsing error: gtk-widgets.css:85:17: Not using units is deprecated. Assuming 'px'. (nautilus:2906): Gtk-WARNING **: Theme parsing error: gtk-widgets.css:192:17: Not using units is deprecated. Assuming 'px'. (nautilus:2906): Gtk-WARNING **: Theme parsing error: gtk-widgets.css:228:17: Not using units is deprecated. Assuming 'px'. (nautilus:2906): Gtk-WARNING **: Theme parsing error: gtk-widgets.css:275:17: Not using units is deprecated. Assuming 'px'. (nautilus:2906): Gtk-WARNING **: Theme parsing error: gtk-widgets.css:310:17: Not using units is deprecated. Assuming 'px'. (nautilus:2906): Gtk-WARNING **: Theme parsing error: gtk-widgets.css:389:17: Not using units is deprecated. Assuming 'px'. (nautilus:2906): Gtk-WARNING **: Theme parsing error: gtk-widgets.css:737:17: Not using units is deprecated. Assuming 'px'. (nautilus:2906): Gtk-WARNING **: Theme parsing error: gtk-widgets.css:1095:17: Not using units is deprecated. Assuming 'px'. (nautilus:2906): Gtk-WARNING **: Theme parsing error: gtk-widgets.css:1137:17: Not using units is deprecated. Assuming 'px'. (nautilus:2906): Gtk-WARNING **: Theme parsing error: gtk-widgets.css:1755:17: Not using units is deprecated. Assuming 'px'. (nautilus:2906): Gtk-WARNING **: Theme parsing error: gtk-widgets.css:1856:17: Not using units is deprecated. Assuming 'px'. (nautilus:2906): Gtk-WARNING **: Theme parsing error: gtk-widgets.css:1873:18: Not using units is deprecated. Assuming 'px'. (nautilus:2906): Gtk-WARNING **: Theme parsing error: gtk-widgets.css:1889:17: Not using units is deprecated. Assuming 'px'. (nautilus:2906): Gtk-WARNING **: Theme parsing error: gtk-widgets.css:1947:17: Not using units is deprecated. Assuming 'px'. (nautilus:2906): Gtk-WARNING **: Theme parsing error: gtk-widgets.css:1954:17: Not using units is deprecated. Assuming 'px'. (nautilus:2906): Gtk-WARNING **: Theme parsing error: gtk-widgets.css:1967:17: Not using units is deprecated. Assuming 'px'. (nautilus:2906): Gtk-WARNING **: Theme parsing error: gtk-widgets.css:2025:17: Not using units is deprecated. Assuming 'px'. (nautilus:2906): Gtk-WARNING **: Theme parsing error: gtk-widgets.css:2075:17: Not using units is deprecated. Assuming 'px'. (nautilus:2906): Gtk-WARNING **: Theme parsing error: gtk-widgets.css:2090:17: Not using units is deprecated. Assuming 'px'. (nautilus:2906): Gtk-WARNING **: Theme parsing error: gtk-widgets.css:2195:17: Not using units is deprecated. Assuming 'px'. (nautilus:2906): Gtk-WARNING **: Theme parsing error: gnome-panel.css:92:17: Not using units is deprecated. Assuming 'px'. (nautilus:2906): Gtk-WARNING **: Theme parsing error: nautilus.css:15:15: Not using units is deprecated. Assuming 'px'. (nautilus:2906): Gtk-WARNING **: Theme parsing error: nautilus.css:15:17: Not using units is deprecated. Assuming 'px'. (nautilus:2906): Gtk-WARNING **: Theme parsing error: nautilus.css:79:17: Not using units is deprecated. Assuming 'px'. (nautilus:2906): Gtk-WARNING **: Theme parsing error: nautilus.css:84:17: Not using units is deprecated. Assuming 'px'. (nautilus:2906): Gtk-WARNING **: Theme parsing error: nautilus.css:113:17: Not using units is deprecated. Assuming 'px'. (nautilus:2906): Gtk-WARNING **: Theme parsing error: nautilus.css:118:18: Not using units is deprecated. Assuming 'px'. (nautilus:2906): Gtk-WARNING **: Theme parsing error: nemo.css:15:15: Not using units is deprecated. Assuming 'px'. (nautilus:2906): Gtk-WARNING **: Theme parsing error: nemo.css:15:17: Not using units is deprecated. Assuming 'px'. (nautilus:2906): Gtk-WARNING **: Theme parsing error: nemo.css:79:17: Not using units is deprecated. Assuming 'px'. (nautilus:2906): Gtk-WARNING **: Theme parsing error: nemo.css:84:17: Not using units is deprecated. Assuming 'px'. (nautilus:2906): Gtk-WARNING **: Theme parsing error: nemo.css:113:17: Not using units is deprecated. Assuming 'px'. (nautilus:2906): Gtk-WARNING **: Theme parsing error: nemo.css:118:18: Not using units is deprecated. Assuming 'px'. (nautilus:2906): Gtk-WARNING *: Theme parsing error: unity.css:21:18: Not using units is deprecated. Assuming 'px'. Initializing nautilus-dropbox 1.4.0 Initializing nautilus-open-terminal extension * Message: Initializing gksu extension...

    Read the article

  • XML parsing with SAX | how to handle special characters?

    - by cedar715
    We have a JAVA application that pulls the data from SAP, parses it and renders to the users. The data is pulled using JCO connector. Recently we were thrown an exception: org.xml.sax.SAXParseException: Character reference "&#00" is an invalid XML character. So, we are planning to write a new level of indirection where ALL special/illegal characters are replaced BEFORE parsing the XML. My questions here are : 1. Is there any existing(open source) utility that does this job of replacing illegal characters in XML? 2. Or if I had to write such utility, how should i handle them? 3. Why is the above exception thrown? Thank You.

    Read the article

  • Path to XML DTD for DBUnit in multi-module Java/Maven project?

    - by HDave
    I have a multi-module maven project. Within the persist module I have a number of XML files data files that reference a DTD: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE myapp-data SYSTEM "myapp-data.dtd" > <dataset> .....omitted for brevity.... </dataset> The DTD is stored in the same directory with the XML files and even Eclipse reports these XML files as valid. However, when I run the application, the DBUnit FlatXMLDataSet throws a FileNotFound exception because it cannot located the DTD. It is apparently looking for the DTD in the root project directory (e.g. myproject/). I would have expected it to look for the DTD in the same directory as the XML file itself (e.g. myproject/persist/target/test-data). Looking at the DBUnit source code, it has this to say about it "Relative DOCTYPE uri are resolved from the current working dicrectory." What's a good way to fix this?

    Read the article

  • MODX: Snippet strips and hangs string when parsing the vars.

    - by CuSS
    Hey all i have a snippet call like this: [!mysnippet?&content=`[*content*]` !] What happen is that, if i send some html like this: [!mysnippet?&content=`<p color='red'>Yeah</p>` !] it will return this: <p colo the [test only] snippet code (mysnippet) is: <?php return $content; ?> Why is this happening? My actual snippet is converting html to pdf, so i really need this. Thank you all ;D EDIT: I'm using Modx Evo 1.0.2

    Read the article

  • Treetop: parsing single node returns nil

    - by Matchu
    I'm trying to get the basic of Treetop parsing. Here's a very simple bit of grammar so that I can say ArithmeticParser.parse('2+2').value == 4. grammar Arithmetic rule additive first:number '+' second:number { def value first.value + second.value end } end rule number [1-9] [0-9]* { def value text_value.to_i end } end end Parsing 2+2 works correctly. However, parsing 2 or 22 returns nil. What did I miss?

    Read the article

  • lr parsing table

    - by flufferok
    Could any1 explain how can i transform ll(1) parsing table to lr(1) parsing table? Or are there any tables already for lr1 mathematical parsing(+,-,/,*,^)?

    Read the article

  • Parsing every part of an HTTP header field-value

    - by brickner
    Hi all. I'm parsing HTTP data directly from packets (either TCP reconstructed or not, you can assume it is). I'm looking for the best way to parse HTTP as accurately as possible. The main issue here is the HTTP header. Looking at the basic RFC of HTTP/1.1, it seems that HTTP header parsing would be complex. The RFC describes very complex regular expressions for different parts of the header. Should I write these regular expressions to parse the different parts of the HTTP header? The basic parsing I've written so far for HTTP header is for the generic HTTP header: message-header = field-name ":" [ field-value ] And I've included replacing inner LWS with SP and repeating headers with the same field-name with comma separated values as described in section 4.2. However, looking at section 14.9 for example would show that in order to parse the different parts of the field-value I need a much more complex parsing scheme. How do you suggest I should handle the complex parts of HTTP parsing (specifically the field-value) assuming I want to give the parser users the full capabilities of HTTP and to parse every part of HTTP? Design suggestions for this would also be appreciated. Thanks.

    Read the article

  • .NET: What is the purpose of the ProhibitDtd property in XmlReaderSettings? Why is DTD a security i

    - by Cheeso
    The documentation says: When set to true, the XmlReader throws an XmlException when any DTD content is encountered. Do not enable DTD processing if you are concerned about Denial of Service issues or if you are dealing with untrusted sources. If you have DTD processing enabled, you can use the XmlSecureResolver to restrict the resources that the XmlReader can access. You can also design your application so that the XML processing is memory and time constrained. For example, configure time-out limits in your ASP.NET application. Can someone please explain the issue? Why would a reader application want to prohibit the retrieval of a DTD? Where is the denial-of-service issue, if it is a reading application? What is the "trust" issue that is mentioned? Thanks

    Read the article

  • android sdk main.out.xml parsing error?

    - by mobibob
    I just started a new Android project, "WeekendStudy" to continue learning Android development and I got stumped compiling the default 'hello weekendstudy' compile / run. I think that I missed a step in configuration and setup, but I am at a loss to find out where. I have an AVD configured, set and launched. When I press 'run', the SDK is building a file main.out.xml and then fails as this: [2010-03-06 09:46:47 - WeekendStudy]Error in an XML file: aborting build. [2010-03-06 09:46:48 - WeekendStudy]res/layout/main.xml:0: error: Resource entry main is already defined. [2010-03-06 09:46:48 - WeekendStudy]res/layout/main.out.xml:0: Originally defined here. [2010-03-06 09:46:48 - WeekendStudy]/Users/mobibob/Projects/workspace-weekend/WeekendStudy/res/layout/main.out.xml:1: error: Error parsing XML: no element found [2010-03-06 09:48:16 - WeekendStudy]Error in an XML file: aborting build. [2010-03-06 09:48:16 - WeekendStudy]res/layout/main.xml:0: error: Resource entry main is already defined. [2010-03-06 09:48:16 - WeekendStudy]res/layout/main.out.xml:0: Originally defined here. [2010-03-06 09:48:16 - WeekendStudy]/Users/mobibob/Projects/workspace-weekend/WeekendStudy/res/layout/main.out.xml:1: error: Error parsing XML: no element found [2010-03-06 09:55:29 - WeekendStudy]res/layout/main.xml:0: error: Resource entry main is already defined. [2010-03-06 09:55:29 - WeekendStudy]res/layout/main.out.xml:0: Originally defined here. [2010-03-06 09:55:29 - WeekendStudy]/Users/mobibob/Projects/workspace-weekend/WeekendStudy/res/layout/main.out.xml:1: error: Error parsing XML: no element found [2010-03-06 09:55:49 - WeekendStudy]Error in an XML file: aborting build. [2010-03-06 09:55:49 - WeekendStudy]res/layout/main.xml:0: error: Resource entry main is already defined. [2010-03-06 09:55:49 - WeekendStudy]res/layout/main.out.xml:0: Originally defined here. [2010-03-06 09:55:49 - WeekendStudy]/Users/mobibob/Projects/workspace-weekend/WeekendStudy/res/layout/main.out.xml:1: error: Error parsing XML: no element found

    Read the article

  • Memory Issues When DOM Parsing A Large XML File on Android Devices

    - by tonyc
    Hey awesome SO users, I have an Android application that parses an XML file for users and displays results in a much more mobile friendly format. The app works great for most users, but some users have lots and lots of data and the app crashes on them because it runs out of memory. Is there any way I have a DOM style XML parser quit parsing data after a certain amount of parsing? I only need the first 30 or so elements so it would make the application much more efficient. I'd like to use a SAX or pull parser instead, but the XML I'm parsing is not valid and I have no control over it. Unless anyone has some good SAX solutions that let me parse messy, invalid XML, I think DOM is the only way to go. Thanks for reading!

    Read the article

  • Validate an Xml file against a DTD with a proxy. C# 2.0

    - by Chris Dunaway
    I have looked at many examples for validating an XML file against a DTD, but have not found one that allows me to use a proxy. I have a cXml file as follows (abbreviated for display) which I wish to validate: <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE cXML SYSTEM "http://xml.cxml.org/schemas/cXML/1.2.018/InvoiceDetail.dtd"> <cXML payloadID="123456" timestamp="2009-12-10T10:05:30-06:00"> <!-- content snipped --> </cXML> I am trying to create a simple C# program to validate the xml against the DTD. I have tried code such as the following but cannot figure out how to get it to use a proxy: private static bool isValid = false; static void Main(string[] args) { try { XmlTextReader r = new XmlTextReader(args[0]); XmlReaderSettings settings = new XmlReaderSettings(); XmlDocument doc = new XmlDocument(); settings.ProhibitDtd = false; settings.ValidationType = ValidationType.DTD; settings.ValidationEventHandler += new ValidationEventHandler(v_ValidationEventHandler); XmlReader validator = XmlReader.Create(r, settings); while (validator.Read()) ; validator.Close(); // Check whether the document is valid or invalid. if (isValid) Console.WriteLine("Document is valid"); else Console.WriteLine("Document is invalid"); } catch (Exception ex) { Console.WriteLine(ex.ToString()); } } static void v_ValidationEventHandler(object sender, ValidationEventArgs e) { isValid = false; Console.WriteLine("Validation event\n" + e.Message); } The exception I receive is System.Net.WebException: The remote server returned an error: (407) Proxy Authentication Required. which occurs on the line while (validator.Read()) ; I know I can validate against a DTD locally, but I don't want to change the xml DOCTYPE since that is what the final form needs to be (this app is solely for diagnostic purposes). For more information about the cXML spec, you can go to cxml.org. I appreciate any assistance. Thanks

    Read the article

  • getting 502 proxy error while parsing

    - by developer
    Iam parsing a page and im getting response from that but after some time i.e. after some of the parsing gets done i get this error from the server - Proxy Error The proxy server received an invalid response from an upstream server. The proxy server could not handle the request GET /file.php. Reason: Error reading from remote server and after this my parsing fails. I even tried sleep() function but it didnt helped and the error still came. Are they temporarily blocking my ip or what?? What could be the reason for this and how can i parse those pages without getting this error and all ???

    Read the article

  • What are the arguments against parsing the Cthulhu way?

    - by smarmy53
    I have been assigned the task of implementing a Domain Specific Language for a tool that may become quite important for the company. The language is simple but not trivial, it already allows nested loops, string concatenation, etc. and it is practically sure that other constructs will be added as the project advances. I know by experience that writing a lexer/parser by hand -unless the grammar is trivial- is a time consuming and error prone process. So I was left with two options: a parser generator à la yacc or a combinator library like Parsec. The former was good as well but I picked the latter for various reasons, and implemented the solution in a functional language. The result is pretty spectacular to my eyes, the code is very concise, elegant and readable/fluent. I concede it may look a bit weird if you never programmed in anything other than java/c#, but then this would be true of anything not written in java/c#. At some point however, I've been literally attacked by a co-worker. After a quick glance at my screen he declared that the code is uncomprehensible and that I should not reinvent parsing but just use a stack and String.Split like everybody does. He made a lot of noise, and I could not convince him, partially because I've been taken by surprise and had no clear explanation, partially because his opinion was immutable (no pun intended). I even offered to explain him the language, but to no avail. I'm positive the discussion is going to re-surface in front of management, so I'm preparing some solid arguments. These are the first few reasons that come to my mind to avoid a String.Split-based solution: you need lot of ifs to handle special cases and things quickly spiral out of control lots of hardcoded array indexes makes maintenance painful extremely difficult to handle things like a function call as a method argument (ex. add( (add a, b), c) very difficult to provide meaningful error messages in case of syntax errors (very likely to happen) I'm all for simplicity, clarity and avoiding unnecessary smart-cryptic stuff, but I also believe it's a mistake to dumb down every part of the codebase so that even a burger flipper can understand it. It's the same argument I hear for not using interfaces, not adopting separation of concerns, copying-pasting code around, etc. A minimum of technical competence and willingness to learn is required to work on a software project after all. (I won't use this argument as it will probably sound offensive, and starting a war is not going to help anybody) What are your favorite arguments against parsing the Cthulhu way?* *of course if you can convince me he's right I'll be perfectly happy as well

    Read the article

  • No grammar constraints (DTD or XML schema) detected for the document.

    - by fastcodejava
    I have this dtd : http://fast-code.sourceforge.net/template.dtd But when I include in an xml I get the warning : No grammar constraints (DTD or XML schema) detected for the document. The xml is : <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE templates PUBLIC "//UNKNOWN/" "http://fast-code.sourceforge.net/template.dtd"> <templates> <template type="type"> <description>Some</description> <variation></variation> <variation-field></variation-field> <allow-multiple-variation></allow-multiple-variation> <class-pattern></class-pattern> <getter-setter>setter</getter-setter> <allowed-file-extensions>java</allowed-file-extensions> <number-required-classes>1</number-required-classes> <template-body> <![CDATA[ Some Data ]]> </template-body> </template> </templates> Any clue?

    Read the article

  • [java - xml] Get DTD from an XML file

    - by itit
    How can I get in Java the DTD file name specified in an xml file? So, if I have: <!DOCTYPE TEI SYSTEM "dtd-file.dtd" [ [ <!ENTITY c24r SYSTEM "c2r.jpg" NDATA JPEG> <!NOTATION JPEG SYSTEM "image/jpeg"> <!ELEMENT figure EMPTY> <!ATTLIST figure entity CDATA #REQUIRED> ]> I want the string "dtd-file.dtd"

    Read the article

  • Parsing Strings ( .crt files )

    - by user1661521
    Base Knowledge : I have a .crt file ( certification authoritie file ) and he is composed of many fields but in one line that resumes this question i have this : Certificate: ...(alot of stuff before)... Subject: C=US, ST=Maryland, L=Pasadena, O=Brent Baccala, OU=FreeSoft, CN=www.freesoft.org/[email protected] Subject Public Key Info: ...(alot of stuff after) and i need to parse the file to populate a .csv file and i have that done the problem that i need help is, i need to get the field: CN=www.fresoft.org but when i get this kind of CN=...(Value instead of the ...) with alot of slashes i get a error in the parsing like the raw string is: CN=foo/bar/the/hell/emailAddress=blablabla and i need only: foo/bar/the/hell and for a moment i got that in the correct column but when i dont have the emailAddress something just fail in my parsing and i then get in my CN .csv column the information wrong instead of |CN| foo/bar/the/hell i get: |CN| OU=FreeSoft, foo/bar/the/hell. I have this code doing the CN parsing: #!/bin/bash subject_line=$(echo $cert | grep -o "Subject:.*Subject Public Key Info") cn=$(echo $subject_line | grep -o "CN=.*" ) if [ $(echo $cn | grep -c ".*email.*") -gt 0 ]; then end_cn=$(echo $cn | grep -b -o emailAddress) end_cn_idx=$(echo $end_cn | grep -o .*:) final_end_cn=${end_cn_idx:0:-1} common_name=${cn:3:$final_end_cn-4} echo $common_name else end_cn=$(echo $cn | grep -b -o "Subject Public Key Info") end_cn_idx=$(echo $end_cn | grep -o .*:) final_end_cn=${end_cn_idx:0:-1} common_name=${cn:3:$final_end_cn-5} echo $common_name fi

    Read the article

  • Java iteration reading & parsing

    - by Patrick Lorio
    I have a log file that I am reading to a string public static String Read (String path) throws IOException { StringBuilder sb = new StringBuilder(); InputStream in = new BufferedInputStream(new FileInputStream(path)); int r; while ((r = in.read()) != -1) { sb.append(r); } return sb.toString(); } Then I have a parser that iterates over the entire string once void Parse () { String con = Read("log.txt"); for (int i = 0; i < con.length; i++) { /* parsing action */ } } This is hugely a waste of cpu cycles. I loop over all the content in Read. Then I loop over all the content in Parse. I could just place the /* parsing action */ under the while loop in the Read method, which would be find but I don't want to copy the same code all over the place. How can I parse the file in one iteration over the contents and still have separate methods for parsing and reading? In C# I understand there is some sort of yield return thing, but I'm locked with Java. What are my options in Java?

    Read the article

  • Which type of file parsing easiest and efficient and good ?(html,pdf,csv,text)

    - by Harikrishna
    I want to parse the html file, pdf file, csv file and text file. Now parsing for which type of file (specified above) is easiest and efficient ? Like parsing for html file is easiest and efficient OR parsing for pdf file is easiest and efficient OR parsing for csv file is easiest and efficient ? I am asking this question because I want to parse pdf ,html ,csv and text file through common parsing code if possible. And now suppose if parsing for html is easiest and efficient then : I will write the parsing code for html file and will try to convert pdf file to the html file(if possible)so the code written for parsing html file will also work for pdf file also. And thus I will try to convert pdf,csv and text file to html file.And write the code for parsing html file and thus this code will parse html,pdf,csv and text file. Suppose if parsing for pdf is easiest and efficient then : I will convert html,csv and text file to pdf and write the code for parsing pdf file.So the code for parsing pdf file can parse html,csv and text file. So my question is (1) Which type of file parsing is easiest and efficient (pdf,csv,html,text) ? (2) And converting files(pdf,text,html,csv) to eachother is possible. Like if html parsing easiest then pdf to html,text to html and csv to html.

    Read the article

  • Parsing an header with two different version [ID3] avoiding code duplication?

    - by user66141
    I really hope you could give me some interesting viewpoints for my situation, my ways to approach my issue are not to my liking . I am writing an mp3 parser , starting with an ID3v2 parser . Right now I`m working on the extended header parsing , my issue is that the optional header is defined differently in version 2.3 and 2.4 of the tag . The 2.3 version optional header is defined as follows : struct ID3_3_EXTENDED_HEADER{ DWORD dwExtHeaderSize; //Extended header size (either 6 or 8 bytes , excluded) WORD wExtFlags; //Extended header flags DWORD dwSizeOfPadding; //Size of padding (size of the tag excluding the frames and headers) }; While the 2.4 version is defined : struct ID3_4_EXTENDED_HEADER{ DWORD dwExtHeaderSize; //Extended header size (synchsafe int) BYTE bNumberOfFlagBytes; //Number of flag bytes BYTE bFlags; //Flags }; How could I parse the header while minimizing code duplication ? Using two different functions to parse each version sounds less great , using a single function with a different flow for each occasion is similar , any good practices for this kind of issues ? any tips for avoiding code duplication ? anything would be great .

    Read the article

  • html parsing with libxml

    - by zajcev
    In another thread I got convinced into using HTML parsers instead of regexps for HTML parsing (I thought they would work fine, but they didn't ;) ). I thought of using libxml (it has some HTML parser built in), but failed to find any useful tutorial. I also found this site and it says here it should do fine even with severly broken HTML. Could you give me some examples of HTML parsing with libxml, or maybe recommend some different free library for Linux? I'm using C++. I just thought someone would have some example code, so that I don't have to analyze the headers ;)

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >