pdf parsing - Page 110 - Developer IT

Removing HTML from a Java String

- by Mason

Is there a good way to remove HTML from a Java string? A simple regex like replaceAll("\\<.*?>","") will work, but things like & wont be converted correctly and non-HTML between the two angle brackets will be removed (ie the .*? in the regex will disappear).

Read the article

Can I use python ast module for this?

- by Juanjo Conti

I'd like to write a program that modify python programs in this way: change "some literal string %" % SOMETHING to functioncall("some literal string %") % SOMETHING Thanks,

Read the article

how to control textbox type to double in visual basic?

- by fema

Hi, I'd like to make a textbox that accepts only numbers, but not integer, but rather double. I've read here about e.Handled = Not Char.IsDigit(e.KeyChar) and it works, but again, it can be used only for integer, since it declines decimal point. Another thing I've read here is If Not Double.TryParse(TextBox2.Text, value) Then .... and it would work fine, except that it allows only decimal comma instead of point. I don't know whether it's because of my location settings (Hungary, we use commas instead of points), but I don't have any other idea how to solve my problem, and the SQL server I send my data uses decimal point. Thanks in advance.

Read the article

ANTRL: token to text in rewrite rule

- by Antonio

I'm building an AST using ANTLR. I want to write a production that match a this string: ${identifier} so, in my grammar file I have: reference : DOLLAR LBRACE IDENT RBRACE -> ^(NODE_VAR_REFERENCE IDENT) ; This works fine. I'm using my own adaptor to emit tree nodes. The rewrite rule used creates for me two nodes: one for NODE_VAR_REFERENCE and one for IDENT. What I want to do is create only one node (for NODE_VAR_REFERENCE token) and this node must have the IDENT token in his "token" field. Is this possible using a rewrite rule? Thanks.

Read the article

JavaCC: How can I specify which token(s) are expected in certain context?

- by java.is.for.desktop

Hello, everyone! I need to make JavaCC aware of a context (current parent token), and depending on that context, expect different token(s) to occur. Consider the following pseudo-code: TOKEN <abc> { "abc*" } // recognizes "abc", "abcd", "abcde", ... TOKEN <abcd> { "abcd*" } // recognizes "abcd", "abcde", "abcdef", ... TOKEN <element1> { "element1" "[" expectOnly(<abc>) "]" } TOKEN <element2> { "element2" "[" expectOnly(<abcd>) "]" } ... So when the generated parser is "inside" a token named "element1" and it encounter "abcdef" it recognizes it as <abc>, but when its "inside" a token named "element2" it recognizes the same string as <abcd>. element1 [ abcdef ] // aha! it can only be <abc> element2 [ abcdef ] // aha! it can only be <abcd> If I'm not wrong, it would behave similar to more complex DTD definitions of an XML file. So, how can one specify, in which "context" which token(s) are valid/expected? NOTE: It would be not enough for my real case to define a kind of "hierarchy" of tokens, so that "abcdef" is always first matched against <abcd> and than <abc>. I really need context-aware tokens.

Read the article

PHP Dom problem, how to insert html code in a particular div

- by sala_7

I am trying to replace the html code inside the div 'resultsContainer' with the html of $response. The result of my unsuccessful code is that the contents of 'resultsContainer' remain and the html of $response shows up on screen as text rather than being parsed as html. Finally, I would like to inject the content of $response inside 'resultContainer' without having to create any new div, I need this: <div id='resultsContainer'>Html inside $response here...</div> and NOT THIS: <div id='resultsContainer'><div>Html inside $response here...</div></div> // Set Config libxml_use_internal_errors(true); $doc = new DomDocument(); $doc->strictErrorChecking = false; $doc->validateOnParse = true; // load the html page $app = file_get_contents('index.php'); $doc->loadHTML($app); // get the dynamic content $response = file_get_contents('search.php'.$query); $response = utf8_decode($response); // add dynamic content to corresponding div $node = $doc->createElement('div', $response); $doc->getElementById('resultsContainer')->appendChild($node); // echo html snapshot echo $doc->saveHTML();

Read the article

Python + Expat: Error on  entities

- by clacke

I have written a small function, which uses ElementTree and xpath to extract the text contents of certain elements in an xml file: #!/usr/bin/env python2.5 import doctest from xml.etree import ElementTree from StringIO import StringIO def parse_xml_etree(sin, xpath): """ Takes as input a stream containing XML and an XPath expression. Applies the XPath expression to the XML and returns a generator yielding the text contents of each element returned. >>> parse_xml_etree( ... StringIO('<test><elem1>one</elem1><elem2>two</elem2></test>'), ... '//elem1').next() 'one' >>> parse_xml_etree( ... StringIO('<test><elem1>one</elem1><elem2>two</elem2></test>'), ... '//elem2').next() 'two' >>> parse_xml_etree( ... StringIO('<test><null></null><elem3>three</elem3></test>'), ... '//elem2').next() 'three' """ tree = ElementTree.parse(sin) for element in tree.findall(xpath): yield element.text if __name__ == '__main__': doctest.testmod(verbose=True) The third test fails with the following exception: ExpatError: reference to invalid character number: line 1, column 13 Is the � entity illegal XML? Regardless whether it is or not, the files I want to parse contain it, and I need some way to parse them. Any suggestions for another parser than Expat, or settings for Expat, that would allow me to do that?

Read the article

Effective way of String splitting

- by openidsujoy

I have a completed string like this N:Pay in Cash++RGI:40++R:200++T:Purchase++IP:N++IS:N++PD:PC++UCP:598.80++UPP:0.00++TCP:598.80++TPP:0.00++QE:1++QS:1++CPC:USD++PPC:Points++D:Y++E:Y++IFE:Y++AD:Y++IR:++MV:++CP:~ ~N:ERedemption++RGI:42++R:200++T:Purchase++IP:N++IS:N++PD:PC++UCP:598.80++UPP:0.00++TCP:598.80++TPP:0.00++QE:1++QS:1++CPC:USD++PPC:Points++D:Y++E:Y++IFE:Y++AD:Y++IR:++MV:++CP: this string is like this It's list of PO's(Payment Options) which are separated by ~~ this list may contains one or more PO contains only Key-Value Pairs which separated by : spaces are denoted by ++ I need to extract the values for Key "RGI" and "N". I can do it via for loop , I want a efficient way to do this. any help on this.

Read the article

Good way to parse query string

- by m.edmondson

I have a String that contains the following: ?workarea=London+&+Home+Counties+Ltd&sub=fs&&&FASh*5 which resembles a URI query string. What is the best way to parse the elements of this string (workarea and sub) without messing about with string manipulation? If I use HttpUtility.ParseQueryString is gets stuck as both elements include &. However if I encode the whole thing first I lose the seperations of the elements. Ideally the output would be: workarea = London & Home Counties Ltd sub = fs&&&FASh*5

Read the article

Postfix and right-associative operators in LR(0) parsers

- by Ian

Is it possible to construct an LR(0) parser that could parse a language with both prefix and postfix operators? For example, if I had a grammar with the + (addition) and ! (factorial) operators with the usual precedence then 1+3! should be 1 + 3! = 1 + 6 = 7, but surely if the parser were LR(0) then when it had 1+3 on the stack it would reduce rather than shift? Also, do right associative operators pose a problem? For example, 2^3^4 should be 2^(3^4) but again, when the parser have 2^3 on the stack how would it know to reduce or shift? If this isn't possible is there still a way to use an LR(0) parser, possibly by converting the input into Polish or Reverse Polish notation or adding brackets in the appropriate places? Would this be done before, during or after the lexing stage?

Read the article

SimpleTest assertTags - loose matching? (for CakePHP)

- by Arkaaito

I'd like to use SimpleTest to set up some functionality tests for our project - in particular, we have a very busy page which has some random components and some static components, and I'd like to be able to write a simple test which only confirms the static bits (preferably only the one or two most important ones). In other words, I want to be able to leave out any tags on the page I don't care about, and write something like: $result = "<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head><title>...</title><meta .../></head><body><script type="text/javascript">...</script><div class="center-splash"><span>Welcome JohnDoe</span><p>Your progress:</p>...</div><div class="left-column">...</div><div class="right-column">...</div>...</body></html>"; $expects = array('html'=>true,'body'=>true,'div'=>array('class'=>'center_splash'),'span'=>true,'Welcome JohnDoe','/span','/div','/body','/html'); $this->assertTagsButIgnoreExtras($result, $expects); When I try this with assertTags it fails. Is there a version of assertTags which allows this - something either officially part of the SimpleTest or CakePHP project or unofficially put out under the MIT license or similar?

Read the article

Is there a solution to parse wikipedia xml dump file in Java?

- by Syed

I am trying to parse this huge 25GB Plus wikipedia XML file. Any solution that will help would be appreciated. Preferably a solution in Java.

Read the article

Parasing HTML to find specific links (Without Keywords)

- by Brett Powell

I posted about this sort of earlier, but I am not sure how to post back to my original question as I can only comment or answer my own question. Anyways, I need to get 4 links from a website, the latest stable build links for windows and linux, and the latest development build links for windows and linux (4 links total) within my C++ application. I can download the page (http://www.sourcemod.net/snapshots.php) with LibCURL which is already implemented in the project, but after that I am not sure. I was looking at parsers, but I can't think of how I am going to discern link from link. Obviously using a parser I could get the first link from each table, but this does not seem efficient and would only provide me with the links to windows builds. It looks like the links I need will be in the fourth in both tables, but I am just very familiar with a good way to go about this, so any help would be appreciated.

Read the article

sql report link with rs:Command paramaters not opening in JSF page

- by H3wh0s33ks

I have a report that we need to link (which we've checked to be working) to in a JSF project, the link looks like the following: http://www.example.com/report/summary&rs:Command=Render However when we try to load the page that links to it we get the following error: The reference to entity "rs:Command" must end with the ';' How can I link to the report within my pages and prevent it from trying to parse the rs:Command?

Read the article

PHP - complete url parser help

- by Mark

I have been trying to find an effective url parser, php's own does not include subdomain or extension. On php.net a number of users had contributed and made this: function parseUrl($url) { $r = "^(?:(?P<scheme>\w+)://)?"; $r .= "(?:(?P<login>\w+):(?P<pass>\w+)@)?"; $r .= "(?P<host>(?:(?P<subdomain>[-\w\.]+)\.)?" . "(?P<domain>[-\w]+\.(?P<extension>\w+)))"; $r .= "(?::(?P<port>\d+))?"; $r .= "(?P<path>[\w/]*/(?P<file>\w+(?:\.\w+)?)?)?"; $r .= "(?:\?(?P<arg>[\w=&]+))?"; $r .= "(?:#(?P<anchor>\w+))?"; $r = "!$r!"; // Delimiters preg_match ( $r, $url, $out ); return $out; } Unfortunately it fails on paths with a '-' and I can't for the life of me workout how to amend it to accept '-' in the path name. Thanks

Read the article

Parse one String data using C#

- by skumar

I need to parse the following string data and convert it into the specified C# class object. Please suggest me a solution for this: Input string: A||B||C Output: Class containing a list of 3 objects of type string i.e A, B, C Input String: A||{a1||a2||a3}||B||C Output: Class containing a list of 3 elements i.e A, B, C and inside A having one more List with 3 elements i.e a1, a2, a3. Here elements inside brace symbol { .. } would represent the child elements. Note: Child elements could have again multiple child elements. Please help me on this.

Read the article

strange characters at beginning of file

- by luca

there are strange characters at the beginning of a file I'm editing (using textmate..) I don't know when they appeared, they're invisible in textmate but my script that reads the file goes crazy.. this is the first few chars in the file (as seen with od command): 0000000 177377 000120 000105 000117 000120 000114 000105 000072 the first 2 shouldn't be there I think.. maybe they were caused by some strange dropbox sync? Or something else.. but they tend to reappear (I don't yet know when..) My question: what is that 177377 and a simple way to remove it in my ruby script? thanks

Read the article

How do i make my web browser made in wx.python to parse pages(ex.Google.ro)

- by Marius

can some body help me? please i really need to parse at least google.

Read the article

Making links clickable in Javascript?

- by mdorseif

Is there an simple way of turning a string from Then go to http:/example.com/ and foo the bar! into Then go to <a href="http://example.com">example.com</a> and foo the bar! in Javascript within an existing HTML page?

Read the article

parse a special xml in python

- by zhaojing

I have s special xml file like below: <alarm-dictionary source="DDD" type="ProxyComponent"> <alarm code="402" severity="Alarm" name="DDM_Alarm_402"> <message>Database memory usage low threshold crossed</message> <description>dnKinds = database type = quality_of_service perceived_severity = minor probable_cause = thresholdCrossed additional_text = Database memory usage low threshold crossed </description> </alarm> ... </alarm-dictionary> I know in python, I can get the "alarm code", "severity" in tag alarm by: for alarm_tag in dom.getElementsByTagName('alarm'): if alarm_tag.hasAttribute('code'): alarmcode = str(alarm_tag.getAttribute('code')) And I can get the text in tag message like below: for messages_tag in dom.getElementsByTagName('message'): messages = "" for message_tag in messages_tag.childNodes: if message_tag.nodeType in (message_tag.TEXT_NODE, message_tag.CDATA_SECTION_NODE): messages += message_tag.data But I also want to get the value like dnkind(database), type(quality_of_service), perceived_severity(thresholdCrossed) and probable_cause(Database memory usage low threshold crossed ) in tag description. That is, I also want to parse the content in the tag in xml. Could anyone help me with this? Thanks a lot!

Read the article

Will I use HtmlDocument even I want to parse the HTML string using HtmlAglityPack ?

- by skhan

Hi everyone, I'm working in C#. I'm trying to extract the first instance of img tag from a HTML string (which is actually a post data). This is my code: private string GrabImage(string htmlContent) { String firstImage; HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument(); htmlDoc.LoadHtml(htmlContent); HtmlAgilityPack.HtmlNode imageNode = htmlDoc.DocumentNode.SelectSingleNode("//img"); if (imageNode != null) { return firstImage = imageNode.ToString(); } else return firstImage=" "; } But it gets null in htmlDoc, will I use the HtmlDocument type even if I'm trying to parse the HTML from a string ? P.S btw is it the correct way of grabbing the first instance of image tag from my HTML string?

Read the article

Client side page call/scrape?

- by Silvre

Here is the problem: I have a web application - a frequently changing notification system - that runs on a series of local computers. The application refreshes every couple of seconds to display the new information. The computers only display info, and do not have keyboards or ANY input device. The issue is that if the connection to the server is lost (say updates are installed and a server must be rebooted), a page not found error is displayed). We must then either reboot all computers that are running this app, OR add a keyboard and refresh the browser, OR try to access each computer remotely and refresh the browser. None of these are good options and result in a lot of frustration. I cannot change the actual application OR server environment. So what I need is some way to test the call to the application, and if an error is returned or it times out, continue trying every minute or so until the connection is reestablished. My idea is to create a client-side page scraper, that makes a JS request to the application (which displays basic HTML), and can run locally on the machine, no server required. If the scrape returns the correct content, it displays it. If not it continues to request the page until the actual page content is returned. Is this possible? What is the best way to do it?

Read the article

In Jeditable, how do I make it so that when I click the div to edit, the text box content has initial value that is processed?

- by TIMEX

When the user clicks on the div, jeditable will make a text box. However, I want the initial text to be done with function stripTags(), instead of what's on the page. The reason is that I'm using some URL techniques to turn plain text links into URLs. When the user clicks on the div, jeditable is turning them into <a href=>..</a> Is there a "beforeSubmit" option in jeditable? http://www.appelsiini.net/projects/jeditable

Read the article

Intelligent search and generation of Java code, preferrably using Python?

- by Ipsquiggle

Basically, I do lots of one-off code generation, large-scale refactorings, etc. etc. in Java. My tool language of choice is Python, but I'll take whatever solutions you can offer. Here is a simplified illustration of what I would like, in a pseudocode Generating an implementation for an interface search within my project: for each Interface as iName: write class(name=iName+"Impl", implements=iName) search within the body of iName: for each Method as mName: write method(name=mName, body="// TODO implement this...") Basically, the tool I'm searching for would allow me to: parse files according to their Java structure ("search for interfaces") search for words contextualized by language elements and types ("variables of type SomeClass", "doStuff() method calls on SomeClass instances") to run searches with structural context ("within the body of the current result") easily replace or generate code (with helpers to generate, as above, or functions for replacing, "rename the interface to Foo", "insert the line Blah.Blah()", etc.) The point is, I don't want to spend a lot of time writing these things, as they are usually throwaway. But sometimes I need something just a little smarter than what grep offers. It wouldn't be too hard to write up a simplistic version of this, but if I'm going to use something like this at all, I'd expect it to be robust. Any suggestions of a tool/library that will help me accomplish this?

Read the article

A better way of getting a data table with various column types into string array

- by Vlad

This should be an easy one, looks like I got myself too confused. I get a table from a database, data ranges from varchar to int to Null values. Cheap and dirty way of converting this into a tab-delimited file that I already have is this (shrunken to preserve space, ugliness is kept on par with original): da.Fill(dt) ' da - DataAdapter ' ' dt - DataTable ' Dim lColumns As Long = dt.Columns.Count Dim arrColumns(dt.Columns.Count) As String Dim arrData(dt.Columns.Count) As Object Dim j As Long = 0 Dim arrData(dt.Columns.Count) As Object For i = 0 To dt.Rows.Count - 1 arrData = dt.Rows(i).ItemArray() For j = 0 To arrData.GetUpperBound(0) - 1 arrColumns(j) = arrData(j).ToString Next wrtOutput.WriteLine(String.Join(strFieldDelimiter, arrColumns)) Array.Clear(arrColumns, 0, arrColumns.GetLength(0)) Array.Clear(arrData, 0, arrData.GetLength(0)) Next Not only this is ugly and inefficient, it is also getting on my nerves. Besides, I want, if possible, to avoid the infamous double-loop through the table. I would really appreciate a clean and safe way of rewriting this piece. I like the approach that is used here - especially that is trying to solve the same problem that I have, but it crashes on me when I apply it to my case directly.

Search Results

Search found 7251 results on 291 pages for 'pdf parsing'.

Page 110/291 | < Previous Page | 106 107 108 109 110 111 112 113 114 115 116 117 | Next Page >

- by Mason

- by Juanjo Conti

- by fema

- by Antonio

- by java.is.for.desktop

- by sala_7

- by clacke

- by openidsujoy

- by m.edmondson

- by Ian

- by Arkaaito

- by Syed

- by Brett Powell

- by H3wh0s33ks

- by Mark

- by skumar

- by luca

- by Marius

- by mdorseif

- by zhaojing

- by skhan

- by Silvre

- by TIMEX

- by Ipsquiggle

- by Vlad

< Previous Page | 106 107 108 109 110 111 112 113 114 115 116 117 | Next Page >