Search Results

Search found 37274 results on 1491 pages for 'text parsing'.

Page 93/1491 | < Previous Page | 89 90 91 92 93 94 95 96 97 98 99 100  | Next Page >

  • CData section not finished problem

    - by tomaszs
    When I use DOMDocument::loadXML() for my XML below I get error: Warning: DOMDocument::loadXML() [domdocument.loadxml]: CData section not finished http://www.pdclipart.org/displayimage.php?album=se in Entity, Warning: DOMDocument::loadXML() [domdocument.loadxml]: Premature end of data in tag image line 7 in Entity Warning: DOMDocument::loadXML() [domdocument.loadxml]: Premature end of data in tag quizz line 3 in Entity Warning: DOMDocument::loadXML() [domdocument.loadxml]: Premature end of data in tag quizzes line 2 in Entity Fatal error: Call to a member function getElementsByTagName() on a non-object It seems to me that my CData sections are closed but still I get this error. XML looks like this: <?xml version="1.0" encoding="utf-8"?> <quizzes> <quizz> <title><![CDATA[Title]]></title> <descr><![CDATA[Some text here!]]></descr> <tags><![CDATA[one tag, second tag]]></tags> <image><![CDATA[http://www.site.org/displayimage.php?album=search&cat=0&pos=1]]></image> <results> <result> <title><![CDATA[Something]]></title> <descr><![CDATA[Some text here]]></descr> <image><![CDATA[http://www.site.org/displayimage.php?album=search&cat=0&pos=17]]></image> <id>1</id> </result> </results> </quizz> </quizzes> Could you help me discover what is the problem?

    Read the article

  • How can I use Convert.ChangeType to convert string into numerics with group separator?

    - by Loic
    Hello, I want to make a generic string to numeric converter, and provide it as a string extension, so I wrote the following code: public static bool TryParse<T>( this string text, out T result, IFormatProvider formatProvider ) where T : struct try { result = (T)Convert.ChangeType( text, typeof( T ), formatProvider ); return true; } catch(... I call it like this: int value; var ok = "123".TryParse(out value, NumberFormatInfo.CurrentInfo) It works fine until I want to use a group separator: As I live in France, where the thousand separator is a space and the decimal separator is a comma, the string "1 234 567,89" should be equals to 1234567.89 (in Invariant culture). But, the function crashes! When a try to perform a non generic conversion, like double.Parse(...), I can use an overload which accepts a NumberStyles parameter. I specify NumberStyles.Number and this time it works! So, the questions are : Why the parsing does not respect my NumberFormatInfo (where the NumberGroupSeparator is well specified to a space, as I specified in my OS) How could I make work the generic version with Convert.ChangeTime, as it has no overload wich accepts a NumberStyles parameter ?

    Read the article

  • Is there any open source tool that automatically 'detects' email threading like Gmail?

    - by Chris W.
    For instance, if the original message (message 1) is... Hey Jon, Want to go get some pizza? -Bill And the reply (message 2) is... Bill, Sorry, I can't make lunch today. Jonathon Parks, CTO Acme Systems On Wed, Feb 24, 2010 at 4:43 PM, Bill Waters wrote: Hey John, Want to go get some pizza? -Bill In Gmail, the system (a) detects that message 2 is a reply to message 1 and turns this into a 'thread' of sorts and (b) detects where the replied portion of the message actually is and hides it from the user. (In this case the hidden portion would start at "On Wed, Feb..." and continue to the end of the message.) Obviously, in this simple example it would be easy to detect the "On <Date, <Name wrote:" or the "" character prefixes. But many email systems have many different style of marking replies (not to mention HTML emails). I get the feeling that you would have to have some damn smart string parsing algorithms to get anywhere near how good GMail's is. Does this technology already exist in an open source project somewhere? Either in some library devoted to this exclusively or perhaps in some open source email client that does similar message threading? Thanks.

    Read the article

  • Perl XML SAX parser emulating XML::Simple record for record

    - by DVK
    Short Q summary: I am looking a fast XML parser (most likely a wrapper around some standard SAX parser) which will produce per-record data structure 100% identical to those produced by XML::Simple. Details: We have a large code infrastructure which depends on processing records one-by-one and expects the record to be a data structure in a format produced by XML::Simple since it always used XML::Simple since early Jurassic era. An example simple XML is: <root> <rec><f1>v1</f1><f2>v2</f2></rec> <rec><f1>v1b</f1><f2>v2b</f2></rec> <rec><f1>v1c</f1><f2>v2c</f2></rec> </root> And example rough code is: sub process_record { my ($obj, $record_hash) = @_; # do_stuff } my $records = XML::Simple->XMLin(@args)->{root}; foreach my $record (@$records) { $obj->process_record($record) }; As everyone knows XML::Simple is, well, simple. And more importantly, it is very slow and a memory hog - due to being a DOM parser and needing to build/store 100% of data in memory. So, it's not the best tool for parsing an XML file consisting of large amount of small records record-by-record. However, re-writing the entire code (which consist of large amount of "process_record"-like methods) to work with standard SAX parser seems like an big task not worth the resources, even at the cost of living with XML::Simple. What I'm looking for is an existing module which will probably be based on a SAX parser (or anything fast with small memory footprint) which can be used to produce $record hashrefs one by one based on the XML pictured above that can be passed to $obj->process_record($record) and be 100% identical to what XML::Simple's hashrefs would have been. I don't care much what the interface of the new module is - e.g whether I need to call next_record() or give it a callback coderef accepting a record.

    Read the article

  • Extracting images from a PDF

    - by sagar
    My Query I want to extract only images from a PDF document, using Objective-C in an iPhone Application. My Efforts I have gone through the info on this link, which has details regarding different operators on PDF documents. I also studied this document from Apple about PDF parsing with Quartz. I also went through the entire PDF reference document from the Adobe site. According to that document, for each image there are the following operators: q Q BI EI I have created a table to get the image: myTable = CGPDFOperatorTableCreate(); CGPDFOperatorTableSetCallback(myTable, "q", arrayCallback2); CGPDFOperatorTableSetCallback(myTable, "TJ", arrayCallback); CGPDFOperatorTableSetCallback(myTable, "Tj", stringCallback); I use this method to get the image: void arrayCallback2(CGPDFScannerRef inScanner, void *userInfo) { // THIS DOESN'T WORK // CGPDFStreamRef stream; // represents a sequence of bytes // if (CGPDFDictionaryGetStream (d, "BI", &stream)){ // CGPDFDataFormat t=CGPDFDataFormatJPEG2000; // CFDataRef data = CGPDFStreamCopyData (stream, &t); // } } This method is called for the operator "q", but I don't know how to extract an image from it. What should be the solution for extracting the images from the PDF documents? Thanks in advance for your kind help.

    Read the article

  • Replace text in word textbox objects using VSTO and C#

    - by Roberto
    Hi, I have to find/replace text from a word document. It works fine for plain text spread through the document, however when the text is in a textbox, the standard find/replace approach doesn't reach it. I found a vba solution, however since I am working in C#, I would like to find a solution in C#. The word document is in 2007 format and my visual studio is 2010. I am using .Net Framework 3.5, but if required, I can consider moving to 4.0. Here is the code for the find/replace that only works with plain text (not in word textbox objects): object Missing = System.Reflection.Missing.Value; object fileToOpen = (object)@"c:\doc.docx"; object fileToSave = (object)@"c:\doc.docx"; Word.Application app = new Word.ApplicationClass(); Word.Document doc = new Word.Document(); try { doc = app.Documents.Open(ref fileToOpen, ref Missing, ref Missing, ref Missing, ref Missing, ref Missing, ref Missing, ref Missing, ref Missing, ref Missing, ref Missing, ref Missing, ref Missing, ref Missing, ref Missing, ref Missing); object replaceAll = Word.WdReplace.wdReplaceAll; app.Selection.Find.ClearFormatting(); app.Selection.Find.Text = "MyTextForReplacement"; app.Selection.Find.Replacement.ClearFormatting(); app.Selection.Find.Replacement.Text = "Found you!"; app.Selection.Find.Execute( ref Missing, ref Missing, ref Missing, ref Missing, ref Missing, ref Missing, ref Missing, ref Missing, ref Missing, ref Missing, ref replaceAll, ref Missing, ref Missing, ref Missing, ref Missing); I also tried using the below code, but didn't work as well: foreach(Word.Shape s in app.ActiveDocument.Shapes) { if(s.TextFrame.HasText >= 1) //The value is always 0 or -1, and even leaving 0 go forward, // it doesn't work, because there is no text in there... { foreach(Word.Field f in s.TextFrame.TextRange.Fields) { switch( f.Type) { . . //I never reached this point . } } } Any help will be appreciated... Thanks, -- Roberto Lopes

    Read the article

  • Multiple undo managers for a text view

    - by Rui Pacheco
    Hi, I've a text view that gets its content from an attributed string stored in a model object. I list several of these model objects in a drawer and when the user clicks on one the text view swaps its content. I now need to also swap the undo manager for the text view. I initialise an undo manager on my model object and use undoManagerForTextView to return it to the text view, but something's not quite right. Strategically placed logging statements show me that everything's working as planned: on startup a new model object is initialised correctly and a non-null undo manager is always pulled by the text view. But when it comes to actually doing undo, I just can't get the behaviour I want. I open a window, type something and press cmd+z, and undo works. I open a window, type something, select a new model on the table, type something, go back to the first model and try to undo and all I get is a beep. Something on the documentation made me raise an eyebrow, as it would mean that I can't have undo with several model objects: The default undo and redo behavior applies to text fields and text in cells as long as the field or cell is the first responder (that is, the focus of keyboard actions). Once the insertion point leaves the field or cell, prior operations cannot be undone.

    Read the article

  • XML parse node value as string

    - by bharathi
    My xml file is look like this . I want to get the value node text content as like this . <property regex=".*" xpath=".*"> <value> 127.0.0.1 </value> <property regex=".*" xpath=".*"> <value> val <![CDATA[ <Valve className="org.tomcat.AccessLogValve" exclude="PASSWORD,pwd,pWord,ticket" enabled="true" serviceName="zohocrm" logDir="../logs" fileName="access" format="URI,&quot;PARAM&quot;,&quot;REFERRER&quot;,TIME_TAKEN,BYTES_OUT,STATUS,TIMESTAMP,METHOD,SESSION_ID,REMOTE_IP,&quot;INTERNAL_IP&quot;,&quot;USER_AGENT&quot;,PROTOCOL,SERVER_NAME,SERVER_PORT,BYTES_IN,ZUID,TICKET_DIGEST,THREAD_ID,REQ_ID"/> ]]> test </value> </property> I want to get text as order they specified in a file . Here is my java code . Document doc = parseDocument("properties.xml"); NodeList properties = doc.getElementsByTagName("property"); for( int i = 0 , len = properties.getLength() ; i < len ; i++) { Element property = (Element)properties.item(i); //How can i proceed further . } Output Expected : Node 1 : 127.0.0.1 Node 2 : val <Valve className="org.tomcat.AccessLogValve" exclude="PASSWORD,pwd,pWord,ticket" enabled="true" serviceName="zohocrm" logDir="../logs" fileName="access" format="URI,&quot;PARAM&quot;,&quot;REFERRER&quot;,TIME_TAKEN,BYTES_OUT,STATUS,TIMESTAMP,METHOD,SESSION_ID,REMOTE_IP,&quot;INTERNAL_IP&quot;,&quot;USER_AGENT&quot;,PROTOCOL,SERVER_NAME,SERVER_PORT,BYTES_IN,ZUID,TICKET_DIGEST,THREAD_ID,REQ_ID"/> test Please suggest your views .

    Read the article

  • Perl XML SAX parser emulating XML::Simple record for record

    - by DVK
    Short Q summary: I am looking a fast XML parser (most likely a wrapper around some standard SAX parser) which will produce per-record data structure 100% identical to those produced by XML::Simple. Details: We have a large code infrastructure which depends on processing records one-by-one and expects the record to be a data structure in a format produced by XML::Simple since it always used XML::Simple since early Jurassic era. An example simple XML is: <root> <rec><f1>v1</f1><f2>v2</f2></rec> <rec><f1>v1b</f1><f2>v2b</f2></rec> <rec><f1>v1c</f1><f2>v2c</f2></rec> </root> And example rough code is: sub process_record { my ($obj, $record_hash) = @_; # do_stuff } my $records = XML::Simple->XMLin(@args)->{root}; foreach my $record (@$records) { $obj->process_record($record) }; As everyone knows XML::Simple is, well, simple. And more importantly, it is very slow and a memory hog - due to being a DOM parser and needing to build/store 100% of data in memory. So, it's not the best tool for parsing an XML file consisting of large amount of small records record-by-record. However, re-writing the entire code (which consist of large amount of "process_record"-like methods) to work with standard SAX parser seems like an big task not worth the resources, even at the cost of living with XML::Simple. What I'm looking for is an existing module which will probably be based on a SAX parser (or anything fast with small memory footprint) which can be used to produce $record hashrefs one by one based on the XML pictured above that can be passed to $obj->process_record($record) and be 100% identical to what XML::Simple's hashrefs would have been.

    Read the article

  • Problem using NSXMLParser with NOAA data on iPhone

    - by Amagrammer
    Can anyone help me see why NSXMLParser is not causing these methods parser:didStartElement:namespaceURI:qualifiedName:attributes: parser:didEndElement:namespaceURI:qualifiedName:attributes: to fire for the part of the following data: <?xml version="1.0" encoding="ISO-8859-1"?><SOAP-ENV:Envelope SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/"><SOAP-ENV:Body><ns1:NDFDgenResponse xmlns:ns1=""><dwmlOut xsi:type="xsd:string"><?xml version="1.0"?> <dwml version="1.0" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://www.nws.noaa.gov/forecasts/xml/DWMLgen/schema/DWML.xsd"> (body excluded) </dwml> </dwmlOut></ns1:NDFDgenResponse></SOAP-ENV:Body></SOAP-ENV:Envelope> I'm not an XML expert, but to me, the part looks like just a regular element, to be parsed just like the parts before it. I do get two parser:parseErrorOccurred: errors, #200 and #201, but they occur during the parsing of the <SOAP-ENV:Body> element, not the element, so I'm not sure if they are relevant. Thanks for any help you can give me.

    Read the article

  • Line Break in XML?

    - by ew89
    Hello, I'm a beginner in web development, and I'm trying to insert line breaks in my XML file. This is what my XML looks like: Song Title Lyrics <song> <title>Song Title</title> <lyric>Lyrics</lyric> <song> <title>Song Title</title> <lyric>Lyrics</lyric> <song> <title>Song Title</title> <lyric>Lyrics</lyric> I want to have line breaks in between the sentences for the lyrics. I tried everything from /n, and other codes similar to it, PHP parsing, etc., and nothing works! Have been googling online for hours and can't seem to find the answer. I'm using the XML to insert data to an HTML page using Javascript. Does anyone know how to solve this problem? Thanks before :)

    Read the article

  • Why does 12:20 PM parse to 0:20 on the next day?

    - by Hanno Fietz
    I'm using java.text.SimpleDateFormat to parse string representations of date/time values inside an XML document. I'm seeing all times that have an hour value of 12 shifted by 12 hours into the future, i. e. 20 minutes past noon gets parsed to mean 20 minutes past midnight the following day. I wrote a unit test which seems to confirm that the error is made upon parsing (I checked the return values from getTime() with the linux shell command date). Now I'm wondering: is there a bug in the parse() method? is there something wrong with the input string? am I using the wrong format string for the input? The input data is taken from Yahoo's YWeather service. Here's the test and its output: public class YWeatherReaderTest { public static final String[] rgDateSamples = { "Thu, 08 Apr 2010 12:20 PM CEST", "Thu, 08 Apr 2010 12:20 AM CEST" }; public void dateParsing() throws ParseException { DateFormat formatter = new SimpleDateFormat("EEE, dd MMM yyyy K:m a z", Locale.US); for (String dtsSrc : YWeatherReaderTest.rgDateSamples) { Date dt = formatter.parse(dtsSrc); String dtsDst = formatter.format(dt); System.out.println(dtsSrc); System.out.println(dtsDst); System.out.println(); } } } Thu, 08 Apr 2010 12:20 PM CEST Fri, 09 Apr 2010 0:20 AM CEST Thu, 08 Apr 2010 12:20 AM CEST Thu, 08 Apr 2010 0:20 PM CEST The second output line of the second iteration is slightly weird, because 00:20 isn't PM. The milliseconds value of the Date object, however, corresponds to the (wrong) time of 20 minutes past noon.

    Read the article

  • Can't read some attributes with SAX

    - by akappa
    Hi all, I'm trying to parse that document with SAX: <scxml version="1.0" initialstate="start" name="calc"> <datamodel> <data id="expr" expr="0" /> <data id="res" expr="0" /> </datamodel> <state id="start"> <transition event="OPER" target="opEntered" /> <transition event="DIGIT" target="operand" /> </state> <state id="operand"> <transition event="OPER" target="opEntered" /> <transition event="DIGIT" /> </state> </scxml> I read all the attributes well, except "initialstate" and "name"... I get the attributes with the startElement handler, but the size of the attribute list for scxml is zero. Why? How I can overcome that problem? Edit: public void startElement(String uri, String localName, String qName, Attributes attributes){ System.out.println(attributes.getValue("initialstate")); System.out.println(attributes.getValue("name")); } that, when parsing the first tag, doesn't work (prints "null" two times). In fact, attributes.getLength(); evaluates to zero. Thanks

    Read the article

  • Can Haskell's Parsec library be used to implement a recursive descent parser with backup?

    - by Thor Thurn
    I've been considering using Haskell's Parsec parsing library to parse a subset of Java as a recursive descent parser as an alternative to more traditional parser-generator solutions like Happy. Parsec seems very easy to use, and parse speed is definitely not a factor for me. I'm wondering, though, if it's possible to implement "backup" with Parsec, a technique which finds the correct production to use by trying each one in turn. For a simple example, consider the very start of the JLS Java grammar: Literal: IntegerLiteral FloatingPointLiteral I'd like a way to not have to figure out how I should order these two rules to get the parse to succeed. As it stands, a naive implementation like this: literal = do { x <- try (do { v <- integer; return (IntLiteral v)}) <|> (do { v <- float; return (FPLiteral v)}); return(Literal x) } Will not work... inputs like "15.2" will cause the integer parser to succeed first, and then the whole thing will choke on the "." symbol. In this case, of course, it's obvious that you can solve the problem by re-ordering the two productions. In the general case, though, finding things like this is going to be a nightmare, and it's very likely that I'll miss some cases. Ideally, I'd like a way to have Parsec figure out stuff like this for me. Is this possible, or am I simply trying to do too much with the library? The Parsec documentation claims that it can "parse context-sensitive, infinite look-ahead grammars", so it seems like something like I should be able to do something here.

    Read the article

  • Python: deleting rows in a text file

    - by Jenny
    A sample of the following text file i have is: > 1 -4.6 -4.6 -7.6 > > 2 -1.7 -3.8 -3.1 > > 3 -1.6 -1.6 -3.1 the data is separated by tabs in the text file and the first column indicates the position. I need to iterate through every value in the text file apart from column 0 and find the lowest value. once the lowest value has been found that value needs to be written to a new text file along with the column name and position. Column 0 has the name "position" Column 1 "fifteen", column 2 "sixteen" and column 3 "seventeen" for example the lowest value in the above data is "-7.6" and is in column 3 which has the name "seventeen". Therefore "7.6", "seventeen" and its position value which in this case is 1 need to be written to the new text file. I then need a number of rows deleted from the above text file. E.G. the lowest value above is "-7.6" and is found at position "1" and is found in column 3 which as the name "seventeen". I therefore need seventeen rows deleted from the text file starting from and including position 1 so the the column in which the lowest value is found denotes the amount of rows that needs to be deleted and the position it is found at states the start point of the deletion

    Read the article

  • How to make a small engine like Wolfram|Alpha?

    - by Koning WWWWWWWWWWWWWWWWWWWWWWW
    Lets say I have three models/tables: operating_systems, words, and programming_languages: # operating_systems name:string created_by:string family:string Windows Microsoft MS-DOS Mac OS X Apple UNIX Linux Linus Torvalds UNIX UNIX AT&T UNIX # words word:string defenitions:string window (serialized hash of defenitions) hello (serialized hash of defenitions) UNIX (serialized hash of defenitions) # programming_languages name:string created_by:string example_code:text C++ Bjarne Stroustrup #include <iostream> etc... HelloWorld Jeff Skeet h AnotherOne Jon Atwood imports 'SORULEZ.cs' etc... When a user searches hello, the system shows the defenitions of 'hello'. This is relatively easy to implement. However, when a user searches UNIX, the engine must choose: word or operating_system. Also, when a user searches windows (small letter 'w'), the engine chooses word, but should also show Assuming 'windows' is a word. Use as an <a href="etc..">operating system</a> instead. Can anyone point me in the right direction with parsing and choosing the topic of the search query? Thanks. Note: it doesn't need to be able to perform calculations as WA can do.

    Read the article

  • XML: what processing rules apply for values intertwined with tags?

    - by iCE-9
    I've started working on a simple XML pull-parser, and as I've just defuzzed my mind on what's correct syntax in XML with regards to certain characters/sequences, ignorable whitespace and such (thank you, http://www.w3schools.com/xml/xml_elements.asp), I realized that I still don't know squat about what can be sketched up as the following case (which Validome finds well-formed very much; note that I only want to use xml files for data storage, no entities, DTD or Schemas needed): <bookstore> <book id="1"> <author>Kurt Vonnegut Jr.</author> <title>Slapstick</title> </book> We drop a pie here. <book id="2">Who cares anyway? <author>Stephen King</author> <title>The Green Mile</title> </book> And another one here. <book id="3"> <author>Next one</author> <title>This time with its own title</title> </book> </bookstore> "We drop a pie here." and "And another one here." are values of the 'bookstore' element. "Who cares anyway?" is a value related to the second 'book' element. How are these processed, if at all? Will "We drop a pie here." and "Another one here." be concatenated to form one value for the 'bookstore' element, or are they treated separately, stored somewhere, affecting the outcome of the parsing of the element they belong to, or...?

    Read the article

  • Why does ANTLR not parse the entire input?

    - by Martin Wiboe
    Hello, I am quite new to ANTLR, so this is likely a simple question. I have defined a simple grammar which is supposed to include arithmetic expressions with numbers and identifiers (strings that start with a letter and continue with one or more letters or numbers.) The grammar looks as follows: grammar while; @lexer::header { package ConFreeG; } @header { package ConFreeG; import ConFreeG.IR.*; } @parser::members { } arith: term | '(' arith ( '-' | '+' | '*' ) arith ')' ; term returns [AExpr a]: NUM { int n = Integer.parseInt($NUM.text); a = new Num(n); } | IDENT { a = new Var($IDENT.text); } ; fragment LOWER : ('a'..'z'); fragment UPPER : ('A'..'Z'); fragment NONNULL : ('1'..'9'); fragment NUMBER : ('0' | NONNULL); IDENT : ( LOWER | UPPER ) ( LOWER | UPPER | NUMBER )*; NUM : '0' | NONNULL NUMBER*; fragment NEWLINE:'\r'? '\n'; WHITESPACE : ( ' ' | '\t' | NEWLINE )+ { $channel=HIDDEN; }; I am using ANTLR v3 with the ANTLR IDE Eclipse plugin. When I parse the expression (8 + a45) using the interpreter, only part of the parse tree is generated: http://imgur.com/iBaEC.png Why does the second term (a45) not get parsed? The same happens if both terms are numbers. Thank you, Martin Wiboe

    Read the article

  • How could I parse this HTML file?

    - by Sergio Tapia
    <div id="main"> <style type="text/css"> </style> <script language="JavaScript"> </script> <p style="margin: 0pt 0pt 0.5em;"><b>Media from&nbsp;<a onclick="(new Image()).src='/rg/find-media-title/media_strip/images/b.gif?link=/title/tt0087538/';" href="/title/tt0087538/">The Karate Kid</a> (1984)</b></p> <style type="text/css"> </style> <table style="border-collapse: collapse;"> </table> </div> I need to somehow extract the href value of the (new Image()). How exactly would I accomplish this with HtmlAgilityPack? I'm new to it, and so far I haven't found a useful tutorial on how to effectively use it for parsing. Thanks for the help!

    Read the article

  • BeautifulSoup can't parse a webpage?

    - by JLTChiu
    I am using beautiful soup for parsing webpage now, I've heard it's very famous and good, but it doesn't seems works properly. Here's what I did import urllib2 from bs4 import BeautifulSoup page = urllib2.urlopen("http://www.cnn.com/2012/10/14/us/skydiver-record-attempt/index.html?hpt=hp_t1") soup = BeautifulSoup(page) print soup.prettify() I think this is kind of straightforward. I open the webpage and pass it to the beautifulsoup. But here's what I got: Warning (from warnings module): File "C:\Python27\lib\site-packages\bs4\builder\_htmlparser.py", line 149 "Python's built-in HTMLParser cannot parse the given document. This is not a bug in Beautiful Soup. The best solution is to install an external parser (lxml or html5lib), and use Beautiful Soup with that parser. See http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser for help.")) ... HTMLParseError: bad end tag: u'</"+"script>', at line 634, column 94 I thought CNN website should be well designed, so I am not very sure what's going on though. Does anyone has idea about this?

    Read the article

  • Long text input from user and PDF generation

    - by Petteri Hietavirta
    I have built a web application that can be seen as an overcomplicated application form. There are bunch of text areas with a given character limit. After the form submission various things happen and one of them is PDF generation. The text is queried from the DB and inserted in the PDF template created in iReports. This works fine but the major pain is overflowing text. The maximum number of characters is set based on 'average' text. But sometimes people prefer to write with CAPS or add plenty of linefeeds to format their text. These then cause user's text to overflow the space given in PDF. Unfortunately the PDF document must look like a real application form so I cannot allow unlimited space. What kind of approaches you have used to tackle this? Clean/restrict user input? Calculate the space requirement of the text based on font metrics? Provide preview of the PDF? (too bad users are not allowed to change their input after submission...)

    Read the article

  • Scale image to fit text boxes around borders

    - by nispio
    I have the following plot in Matlab: The image size may vary, and so may the length of the text boxes at the top and left. I dynamically determine the strings that go in these text boxes and then create them with: [M,N] = size(img); imagesc((1:N)-0.5,(1:M)-0.5, img > 0.5); axis image; grid on; colormap([1 1 1; 0.5 0.5 0.5]); set(gca,'XColor','k','YColor','k','TickDir','out') set(gca,'XTick',1:N,'XTickLabel',cell(1,N)) set(gca,'YTick',1:N,'YTickLabel',cell(1,N)) for iter = 1:M text(-0.5, iter-0.5, sprintf(strL, br{iter,:}), ... 'FontSize',16, ... 'HorizontalAlignment','right', ... 'VerticalAlignment','middle', ... 'Interpreter','latex' ); end for iter = 1:N text(iter-0.5, -0.5, {bc{:,iter}}, ... 'FontSize',16, ... 'HorizontalAlignment','center', ... 'VerticalAlignment','bottom', ... 'Interpreter','latex' ); end where br and bc are cell arrays containing the appropriate numbers for the labels. The problem is that most of the time, the text gets clipped by the edges of the figure. I am using this as a workaround: set(gca,'Position',[0.25 0.25 0.5 0.5]); As you can see, I am simply adding a larger border around the plot so that there is more room for the text. While this scaling works for one zoom level, if I maximize my plot window I get way too much empty space, and if I shrink my plot window, I get clipping again. Is there a more intelligent way to add these labels to use the minimum amount of space while making sure that the text does not get clipped?

    Read the article

  • Best and simple way to handle JSON in Django

    - by primal
    Hi, As part of the application we are developing (with android client and Django server) a json object which contains user name and pass word is sent to server from android client as follows HttpPost post = new HttpPost(URL); /*Adding key value pairs */ json.put("username", un); json.put("password", pwd); StringEntity se = new StringEntity(json.toString()); post.setEntity(se); response = client.execute(post); The response is parsed like this result = responsetoString(response.getEntity().getContent()); //Converts response to String jObject = new JSONObject(result); JSONObject post = jObject.getJSONObject("post"); username = post.getString("username"); message = post.getString("message"); Hope upto this everything is fine. The problem comes when parsing or sending JSON responses in Django server. Whats the best way to do this? We tried using SimpleJSON and it turned out not to be so simple as we didn't find any good tutorials or sample code for the same? Are there any python functions similiar to get,put and opt in java for JSON? Any help would be much appreciated..

    Read the article

  • Using JavaCC to infer semantics from a Composite tree

    - by Skice
    Hi all, I am programming (in Java) a very limited symbolic calculus library that manages polynomials, exponentials and expolinomials (sums of elements like "x^n * e^(c x)"). I want the library to be extensible in the sense of new analytic forms (trigonometric, etc.) or new kinds of operations (logarithm, domain transformations, etc.), so a Composite pattern that represent the syntactic structure of an expression, together with a bunch of Visitors for the operations, does the job quite well. My problem arise when I try to implement operations that depends on the semantics more than on the syntax of the Expression (like integrals, for instance: there are a lot of resolution methods for specific classes of functions, but these same classes can be represented with more than a single syntax). So I thought I need something to "parse" the Composite tree to infer its semantics in order to invoke the right integration method (if any). Someone pointed me to JavaCC, but all the examples I've seen deal only with string parsing; so, I don't know if I'm digging in the right direction. Some suggestions? (I hope to have been clear enough!)

    Read the article

  • Pass the values from ListView control to text boxes in c#

    - by Kumu
    private void deleteDisplayGamesButton_Click(object sender, EventArgs e) { //Game game = new Game(homeTeamComboBox.Text, int.Parse(homeScoreUpDown.Value.ToString()), awayTeamComboBox.Text, int.Parse(awayScoreUpDown.Value.ToString())); deleteDisplayGamesListView.Items.Clear(); deleteDisplayGamesListView.View = View.Details; foreach (Game currentgame in footballLeagueDatabase.games) { ListViewItem row = new ListViewItem(); row.SubItems.Add(currentgame.HomeTeam.ToString()); row.SubItems.Add(currentgame.HomeScore.ToString()); row.SubItems.Add(currentgame.AwayTeam.ToString()); row.SubItems.Add(currentgame.AwayScore.ToString()); deleteDisplayGamesListView.Items.Add(row); } } I need to pass the values from above ListView control to following text boxes when I use the deleteDisplayGamesListView_SelectedIndexChanged method. private void deleteDisplayGamesListView_SelectedIndexChanged(object sender, EventArgs e) { deleteModifyHomeTeamTxt.Text = ""; deleteModifyHomeScoreUpDown.Text = ""; deleteModifyAwayTeamTxt.Text = ""; deleteModifyAwayScoreUpDown.Text = ""; foreach (Game currentgame in footballLeagueDatabase.games); { ?----------------------------- } } Futhermore I need to clear this data from the ListView Control. If you know how to do this please let me know.

    Read the article

< Previous Page | 89 90 91 92 93 94 95 96 97 98 99 100  | Next Page >