Search Results

Search found 15004 results on 601 pages for 'date parsing'.

Page 110/601 | < Previous Page | 106 107 108 109 110 111 112 113 114 115 116 117  | Next Page >

  • Removing HTML entities while preserving line breaks with JSoup

    - by shrodes
    I have been using JSoup to parse lyrics and it has been great until now, but have run into a problem. I can use Node.html() to return the full HTML of the desired node, which retains line breaks as such: Gl&oacute;andi augu, silfurn&aacute;tt <br />Bl&oacute;&eth; alv&ouml;ru, starir &aacute; <br />&Oacute;&eth;ur hundur er &iacute; v&iacute;gam&oacute;&eth;, &iacute; maga... m&eacute;r <br /> <br />Kolni&eth;ur gref, kvik sem dreg h&eacute;r <br />Kolni&eth;ur svart, hvergi bjart n&eacute; But has the unfortunate side-effect, as you can see, of retaining HTML entities and tags. However, if I use Node.text(), I can get a better looking result, free of tags and entities: Glóandi augu, silfurnátt Blóð alvöru, starir á Óður hundur er í vígamóð, í maga... mér Kolniður gref, kvik sem dreg hér Kolniður svart, Which has another unfortunate side-effect of removing the line breaks and compressing into a single line. Simply replacing <br /> from the node before calling Node.text() yields the same result, and it seems that that method is compressing the text onto a single line in the method itself, ignoring newlines. Is it possible to have the best of both worlds, and have tags and entities replaced correctly which preserving the line breaks, or is there another method or way of decoding entities and removing tags without having to replace them manually?

    Read the article

  • What is the easiest way to loop through a folder of files in C#?

    - by badpanda
    I am new to C# and am trying to write a program that navigates the local file system using a config file containing relevant filepaths. My question is this: What are the best practices to use when performing file I/O (this will be from the desktop app to a server and back) and file system navigation in C#? I know how to google, and I have found several solutions, but I would like to know which of the various functions is most robust and flexible. As well, if anyone has any tips regarding exception handling for C# file I/O that would also be very helpful. Thanks!!! badPanda

    Read the article

  • Split string into smaller part with constrain [PHP RegEx HTML]

    - by Sadi
    Hello, I need to split long string into a array with following constrains: Each part will have a limited number of character (e.g. not more than 8000 character) Each part can contain multiple sentences (delimited by . [full stop]) but never a partial sentences. Except if the last part of the string (as last part may not have any full stop. The string may contain HTML tags. But the tag can not be divided as ( to ). That means HTML tag should be intact. But starting tag and ending tag can be stay on different segment/chunk. I think regular expression with preg_split can do it. Would please help me with the proper RegEx. Thank you Sadi

    Read the article

  • Extracting function declarations from a PHP file

    - by byronh
    I'm looking to create an on-site API reference for my framework and application. Basically, say I have a class file model.class.php: class Model extends Object { ... code here ... // Separates results into pages. // Returns Paginator object. final public function paginate($perpage = 10) { ... more code here ... } } and I want to be able to generate a reference that my developers can refer to quickly in order to know which functions are available to be called. They only need to see the comments directly above each function and the declaration line. Something like this (similar to a C++ header file): // Separates results into pages. // Returns Paginator object. final public function paginate($perpage = 10); I've done some research and this answer looked pretty good (using Reflection classes), however, I'm not sure how I can keep the comments in. Any ideas? EDIT: Sorry, but I want to keep the current comment formatting. Myself and the people who are working on the code hate the verbosity associated with PHPDocumentor. Not to mention a comment-rewrite of the entire project would take years, so I want to preserve just the // comments in plaintext.

    Read the article

  • C# equivalent of NaN or IsNumeric

    - by johnc
    This seems like a fairly simple question, and I'm surprised not to have required it before. What is the most efficient way of testing a string input is a numeric (or conversely Not A Number). I guess I can do a Double.Parse or a regex (see below) public static bool IsNumeric(this string value) { return Regex.IsMatch(value, "^\\d+$"); } but I was wondering if there was a implemented way to do it, such as javascript's NaN() or IsNumeric() (was that VB, I can't remember).

    Read the article

  • Wrap words in tags, keep markup

    - by spacevillain
    For example I have a string with markup (from html node): hello, this is dog "h<em>e<strong>llo, thi</strong>s i</em><strong>s d</strong>og" What is the most correct way to find some words in it (let's say "hello" and "dog"), wrap them in a span (make a highlight) and save all the markup? Desired output is something like this (notice properly closed tags) <span class="highlight">h<em>e<strong>llo</strong></em></span><strong>,</strong> <em><strong>thi</strong>s<em> i</em><strong>s <span class="highlight"><strong>d</strong>og</span> Looks the same as it should: hello, this is dog

    Read the article

  • PDFParsing & extracting the images only in iPhone application.

    - by sagar
    Hello - Every one. ** : My Query : ** I want to extract only images from entire pdf document. ( Using Objective C - for iPhone Application ) : My Efforts : I have gone through this link which has details regarding different operators of PDF Document. ( http://mail-archives.apache.org/mod_mbox/pdfbox-commits/200912.mbox/%[email protected]%3E ) I also studied this document - ( http://developer.apple.com/mac/library/documentation/GraphicsImaging/Conceptual/drawingwithquartz2d/dq_pdf_scan/dq_pdf_scan.html#//apple_ref/doc/uid/TP30001066-CH220-TPXREF101 ) I also have gone through entire document of PDFReference.pdf ( From original Adobe Site ) PDFReference.pdf (Adobe Document - says that - for image there are following operators ) q Q BI EI I have placed following table get the image myTable = CGPDFOperatorTableCreate(); CGPDFOperatorTableSetCallback(myTable, "q", arrayCallback2); CGPDFOperatorTableSetCallback(myTable, "TJ", arrayCallback); CGPDFOperatorTableSetCallback(myTable, "Tj", stringCallback); I have following arrayCallback2 method for getting image void arrayCallback2(CGPDFScannerRef inScanner, void *userInfo) { // how to extract image from this code // means I have tried too many different ways. following is incorrect way & not giving image // CGPDFStreamRef stream; // represents a sequence of bytes // if (CGPDFDictionaryGetStream (d, "BI", &stream)){ // CGPDFDataFormat t=CGPDFDataFormatJPEG2000; // CFDataRef data = CGPDFStreamCopyData (stream, &t); // } } above arrayCallback2 method is called for operator "q", But I don't know How to extract the image from it. In short. What should be the solution for extracting the images from the pdf documents? Thanks in advance for your kind help. Sagar kothari.

    Read the article

  • PHP - Get dates of next 5 weekdays?

    - by Dan
    I'm trying to create an array of the next 5 working week days (Monday - Friday, excluding today). I know the working week varies around the world but this is not important for what I am trying to do. So, for example, if today is a Wednesday, I want the dates for Thursday and Friday of the current week and Monday, Tuesday and Wednesday of the following week. I thought this would work: $dates = array(); for ($i = 1; $ < 6; $i ++) { $dates[] = date('Y-m-d', strtotime('+ '.$i.' weekday')); } But for today, it is giving me: Monday 1st Tuesday 2nd Wednesday 3rd Thursday 4th Sunday 7th! Any advice appreciated. Thanks

    Read the article

  • Date difference in minutes

    - by zurna
    I have DateFirstStarted and DateEnded fields in the database. Date values are recorded as DateFirstStarted 04/13/2010 07:00:00.000 PM DateEnded 04/13/2010 09:00:00.000 PM How do I print minute difference between two dates. I tried the following code but it returned something like 999343 Clock = DateDiff("m", objLiveCommentary("DateFirstStarted"), objLiveCommentary("DateEnded"))

    Read the article

  • What sort of object is this and how to use it?

    - by Gary
    What would be the correct name for this type of array? There are 3 main sections and 4 sub-parts consisting of "issuedTime" "text" "url" and "validToTime", how do you start to convert this to an object? If there was only 1 main section, it would be fairly simple to do however with 3 main parts and no identification for each main section has me scratching my head as where to start. Any advise appreciated. [{ "issuedTime":"7:13pm Sunday 13 June 2010", "text":"\nAmended 7:10pm.\n\nText text and more text\n", "url":"\/folder\/fc\/name.png", "validToTime":"12:00am Monday 14 June 2010" },{ "issuedTime":"8:33pm Sunday 13 June 2010", "text":"\nText and more text.\n", "url":"\/folder\/fc\/name.png", "validToTime":"12:00pm Monday 14 June 2010" },{ "issuedTime":"10:40am Sunday 13 June 2010", "text":"\nAnd even more text.", "url":"\/folder\/fc\/name.png", "validToTime":"12:00am Tuesday 15 June 2010" } ]

    Read the article

  • YQL Open Data Table for Wikipedia

    - by Rob Young
    Has anyone written a YQL open data table for accessing Wikipedia? I've had a hunt around the internet and found mention of people using YQL for extracting various bits of information from Wikipedia pages such as microformats, links or content but I haven't been able to find an open data table that ties it all together.

    Read the article

  • How can I get an e-mail address out of a string of key=value pairs?

    - by noob
    How can I get some part of string that I need? accountid=xxxxxx type=prem servertime=1256876305 addtime=1185548735 validuntil=1265012019 username=noob directstart=1 protectfiles=0 rsantihack=1 plustrafficmode=1 mirrors= jsconfig=1 [email protected] lots=0 fpoints=6076 ppoints=149 curfiles=38 curspace=3100655714 bodkb=60000000 premkbleft=25000000 ppointrate=116 I want data after email= but up to live.com.?

    Read the article

  • issue with $.ParseJSON which converts json string to null

    - by Aby
    I am using spring mvc in which i convert the arraylist into json string. I have one object 1) results. My output from spring looks like this: { "data":"[{\"userName\":\"test1\",\"firstName\":\"test\",\"lastName\":\"user\"}, {\"userName\":\"test2\",\"firstName\":\"test1\",\"lastName\":\"user1\"}]", } I get output as null when i do '$.parseJSON' with this output. When i tried testing only with data object it works fine Any help would be great.

    Read the article

  • how to detect escape characters in a string

    - by mix
    Given a string named line whose raw version has this value: \rRAWSTRING how can I detect if it has the escape character \r? What I've tried is: if repr(line).startswith('\r'): blah... but it doesn't catch it. I also tried find, such as: if repr(line).find('\r') != -1: blah doesn't work either. What am I missing? thx!

    Read the article

  • python regex for repeating string

    - by Lars Nordin
    I am wanting to verify and then parse this string (in quotes): string = "start: c12354, c3456, 34526;" //Note that some codes begin with 'c' I would like to verify that the string starts with 'start:' and ends with ';' Afterward, I would like to have a regex parse out the strings. I tried the following python re code: regx = r"V1 OIDs: (c?[0-9]+,?)+;" reg = re.compile(regx) matched = reg.search(string) print ' matched.groups()', matched.groups() I have tried different variations but I can either get the first or the last code but not a list of all three. Or should I abandon using a regex?

    Read the article

  • Python/YACC Lexer: Token priority?

    - by Rosarch
    I'm trying to use reserved words in my grammar: reserved = { 'if' : 'IF', 'then' : 'THEN', 'else' : 'ELSE', 'while' : 'WHILE', } tokens = [ 'DEPT_CODE', 'COURSE_NUMBER', 'OR_CONJ', 'ID', ] + list(reserved.values()) t_DEPT_CODE = r'[A-Z]{2,}' t_COURSE_NUMBER = r'[0-9]{4}' t_OR_CONJ = r'or' t_ignore = ' \t' def t_ID(t): r'[a-zA-Z_][a-zA-Z_0-9]*' if t.value in reserved.values(): t.type = reserved[t.value] return t return None However, the t_ID rule somehow swallows up DEPT_CODE and OR_CONJ. How can I get around this? I'd like those two to take higher precedence than the reserved words.

    Read the article

  • Using beautifulsoup to extract text between line breaks (e.g. <br /> tags)

    - by Michael Altman
    I have the following HTML that is within a larger document <br /> Important Text 1 <br /> <br /> Not Important Text <br /> Important Text 2 <br /> Important Text 3 <br /> <br /> Non Important Text <br /> Important Text 4 <br /> I'm currently using BeautifulSoup to obtain other elements within the HTML, but I have not been able to find a way to get the important lines of text between <br /> tags. I can isolate and navigate to each of the <br /> elements, but can't find a way to get the text in between. Any help would be greatly appreciated. Thanks.

    Read the article

  • removing phone number from a document.

    - by Grant Collins
    Hi, I've got a challenge that I am hoping that the SO community is able to help me with. I trying to parse a lot of html documents in my PHP application to remove personal details, such as names, addresses and phone numbers. I can remove most of these details without too much trouble, however the phone number is a real problem for me. My idea is to take the text from these documents and the use a regex to identify the phone numbers and replace them with another value such as 'xxxx'. I've got 2 regex that I am using one for UK landline numbers and one for UK cell/mobile numbers. However when I try and run them against the text it just returns an empty string. I am using the following preg_replace code: $pattens = array( '/^(((\+44\s?\d{4}|\(?0\d{4}\)?)\s?\d{3}\s?\d{3})|((\+44\s?\d{3}|\(?0\d{3}\)?)\s?\d{3}\s?\d{4})|((\+44\s?\d{2}|\(?0\d{2}\)?)\s?\d{4}\s?\d{4}))(\s?\#(\d{4}|\d{3}))?$/', '/^(\+44\s?7\d{3}|\(?07\d{3}\)?)\s?\d{3}\s?\d{3}$/' ); $replace = array('xxxxx', 'xxxxx'); //do the search for the numbers. $updatedContents = preg_replace($pattens, $replace, $htmlContents); At the moment this is causing me a lot of head scratching as I thought that I had this nailed, but at the moment I can't see what's wrong?? I am sure that it is something really simple. Thanks, Grant

    Read the article

< Previous Page | 106 107 108 109 110 111 112 113 114 115 116 117  | Next Page >