html parsing - Page 21 - Developer IT

Remove anchor from URL in C#

- by kcoppock

I'm trying to pull in an src value from an XML document, and in one that I'm testing it with, the src is: <content src="content/Orwell - 1984 - 0451524934_split_2.html#calibre_chapter_2"/> That creates a problem when trying to open the file. I'm not sure what that #(stuff) suffix is called, so I had no luck searching for an answer. I'd just like a simple way to remove it if possible. I suppose I could write a function to search for a # and remove anything after, but that would break if the filename contained a # symbol (or can a file even have that symbol?) Thanks!

Read the article

search form in html/php via ajax

- by fusion

i've a search form wherein the database query has been coded in php and the html file calls this php file via ajax to display the results in the search form. the problem is, i would like the result to be displayed in the same form as search.html; yet while the ajax works, it goes to search.php to display the results. search.html: <!DOCTYPE html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <script src="scripts/search_ajax.js" type="text/javascript"></script> </head> <body> <form id="submitForm" method="post"> <div class="wrapper"> <div class="field"> <input name="search" id="search" /> </div><br /> <input id="button1" type="submit" value="Submit" class="submit" onclick="run_query();" /><br /> </div> <div id="searchContainer"> </div> </form> </body> </html> if i add action="search.php" to the form tag, it displays the result but on search.php. i'd like it to display on the same form [i.e search.html, and not search.php] if i just add the javascript function [as done above], it displays nothing on search.html

Read the article

How do you do HTML form testing without real user input simulation ?

- by justjoe

this question is like this one, except it's for PHP testing via browser. It's about testing your form input. Right now, i have a form on a single page. It has 12 input boxes. Every time i test the form, i have write those 12 input boxes in my browser. i know it's not a specific coding question. This question is more about how to do direct testing on your form So, how to do recursive testing without consuming too much of your time ?

Read the article

C# program that generates html pages - limit image dimensions on page

- by Professor Mustard

I have a C# program that generates a large number of html pages, based on various bits of data and images that I have stored on the file system. The html itself works just fine, but the images can vary greatly in their dimensions. I need a way to ensure that a given image won't exceed a certain size on the page. The simplest way to accomplish this would be through the html itself... if there was some kind of "maxwidth" or "maxheight" property I could set in the html, or maybe a way to force the image to fit inside a table cell (if I used something like this, I'd have to be sure that the non-offending dimension would automatically scale with the one that's being reduced). The problem is, I don't know much about this "fine tuning" kind of stuff in html, and it seems to be a tough thing to Google around for (most of the solutions involve some sort of html specialization; I'm just using plain html). Alternatively, I could determine the width and height of each image at runtime by examining the image in C#, and then setting width/height values in the html if the image's dimensions exceed a certain value. The problem here is that is seems incredibly inefficient to load an entire image into memory, just to get its dimensions. I would need a way to "peek" at an image and just get its size (it could be bmp, jpg, gif or png). Any recommendations for either approach would be greatly appreciated.

Read the article

scraping text from multiple html files into a single csv file

- by Lulu

I have just over 1500 html pages (1.html to 1500.html). I have written a code using Beautiful Soup that extracts most of the data I need but "misses" out some of the data within the table. My Input: e.g file 1500.html My Code: #!/usr/bin/env python import glob import codecs from BeautifulSoup import BeautifulSoup with codecs.open('dump2.csv', "w", encoding="utf-8") as csvfile: for file in glob.glob('*html*'): print 'Processing', file soup = BeautifulSoup(open(file).read()) rows = soup.findAll('tr') for tr in rows: cols = tr.findAll('td') #print >> csvfile,"#".join(col.string for col in cols) #print >> csvfile,"#".join(td.find(text=True)) for col in cols: print >> csvfile, col.string print >> csvfile, "===" print >> csvfile, "***" Output: One CSV file, with 1500 lines of text and columns of data. For some reason my code does not pull out all the required data but "misses" some data, e.g the Address1 and Address 2 data at the start of the table do not come out. I modified the code to put in * and === separators, I then use perl to put into a clean csv file, unfortunately I'm not sure how to work my code to get all the data I'm looking for!

Read the article

Why "alt" attribute for <img> tag has been considered mandatory by the HTML validator .. ?

- by infant programmer

Is there any logical or technical reason (with the W3C validation) for making alt as required attribute .. This is my actual problem:though my page is perfect enough with respect to W3C validation rules .. Only error I am getting is line XX column YY - Error: required attribute "ALT" not specified I know the significance of "alt" attribute and I have omitted that where it is unnecessary .. (to be more elaborate .. I have added the image to increase the beauty of my page and I don't want alt attribute to show irrelevant message to the viewer) getting rid of the error is secondary .. rather I am curious to know whether is it a flaw with validation rules .. ?? I thank stackOverflow and all the members who responded me .. I got my doubt clarified .. :-)

Read the article

how to find the height of html content

- by ganapati hegde

Hi, I am trying to display 10 html pages as a single document,with 10 chapters. I am using Webkitgtk+ engine to render the HTML pages. I am getting the content of each html files and concatenating all of them to create a single 'char *content' and using webkit_web_view_load_html_string(WebKitWebView *web_view, const gchar *content, const gchar *base_uri); this function to load all the HTML files, as a document. Now i am trying to build a 'table of contents(toc)' for this document, which displays all the chapter names in order. My requirement is, when i click on a particular chapter in toc, the document should scroll to that point. For that i need to know height of each chapter ( height of each html file content). The point to be noted here is, the width of the document can changed and as HTML is reflowable, as width increases height decreases and vice-versa. So each time when height changes i need to find out the current height of each HTML page(or chapter) and hence calculate the distance to be scrolled. How can i find out the height of content of HTML page dynamically ? Thank you for the answer....

Read the article

Why is Swing Parser's handleText not handling nested tags?

- by Jim P

I need to transform some HTML text that has nested tags to decorate 'matches' with a css attribute to highlight it (like firefox search). I can't just do a simple replace (think if user searched for "img" for example), so I'm trying to just do the replace within the body text (not on tag attributes). I have a pretty straightforward HTML parser that I think should do this: final Pattern pat = Pattern.compile(srch, Pattern.CASE_INSENSITIVE); Matcher m = pat.matcher(output); if (m.find()) { final StringBuffer ret = new StringBuffer(output.length()+100); lastPos=0; try { new ParserDelegator().parse(new StringReader(output.toString()), new HTMLEditorKit.ParserCallback () { public void handleText(char[] data, int pos) { ret.append(output.subSequence(lastPos, pos)); Matcher m = pat.matcher(new String(data)); ret.append(m.replaceAll("<span class=\"search\">$0</span>")); lastPos=pos+data.length; } }, false); ret.append(output.subSequence(lastPos, output.length())); return ret; } catch (Exception e) { return output; } } return output; My problem is, when I debug this, the handleText is getting called with text that includes tags! It's like it's only going one level deep. Anyone know why? Is there some simple thing I need to do to HTMLParser (haven't used it much) to enable 'proper' behavior of nested tags? PS - I figured it out myself - see answer below. Short answer is, it works fine if you pass it HTML, not pre-escaped HTML. Doh! Hope this helps someone else. <span>example with <a href="#">nested</a> <p>more nesting</p> </span>

Read the article

Incorrectly formatted html inconsistencies between DOM and what's displayed in firefox plugin

- by deadalnix

I'm currently developing a firefox plugin. This plugin has to handle very crappy website that is really incorrectly formatted. I cannot modify these websites, so I have to handle them. I reduced the bug I'm facing to a short sample of html (if this appellation is appropriate for an horror like this) : <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html> <head> <title>Some title.</title>  <div style="visability:hidden;"> <a href="//example.com"> </a> </div>  <meta name="description" content="Homepage of Company.com, Company's corporate Web site" /> <meta name="keywords" content="Company, Company & Co., Inc., blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla" /> <meta http-equiv="Content-Language" content="en-US" /> <meta http-equiv="content-type" content="text/html; charset=utf-8"/> </head> <body class="homePage"> <div class="globalWrapper"><a href="/page.html">My gorgeous link !</a></div> </body> </html> When opening the webpage, « My gorgeous link ! » if displayed and clickable. However, when I'm exploring the DOM with Javascript into my plugin, everything behaves (DOM exploration and innerHTML property) like the code was this one : <html> <head> <title>Some title.</title>  </head><body><div style="visability:hidden;"> <a href="//example.com"> </a> </div>  <meta name="description" content="Homepage of Company.com, Company's corporate Web site"> <meta name="keywords" content="Company, Company & Co., Inc., blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla"> <meta http-equiv="Content-Language" content="en-US"> </body> </html> So, when exploring the DOM within the plugin, the document is somehow fixed by firefox. But this fixed DOM is inconsistent with what is in the webpage. Thus, my plugin doesn't behave as expected. I'm really puzzled with that issue. The problem exists in both firefox 3.6 and firefox 4 (didn't tested firefox 5 yet). For example, reducing the meta, will fix the issue. Where does this discrepancy come from ? How can I handle it ? EDIT: With the answer I get, I think I should be a little more precise. I do know what firefow is doing when modifying the webpage in the second code snippet. The problem is the following one : « In the fixed DOM that I get into my plugin, the gorgeous link doesn't appear anywhere, but this link is actually visible on the webpage, and works. So the DOM I'm manipulating, and the DOM in the webpage are different - they are fixed in a different manner. » . So where does the difference come in the fixing behaviour, and how can I handle that, or, in other terms, how can I be aware, in my plugin, of the existance of the gorgeous link ?

Read the article

Replacing specific HTML tags using Regex

- by matthewpe

Alright, an easy one for you guys. We are using ActiveReport's RichTextBox to display some random bits of HTML code. The HTML tags supported by ActiveReport can be found here : http://www.datadynamics.com/Help/ARNET3/ar3conSupportedHtmlTagsInRichText.html An example of what I want to do is replace any match of <div style="text-align:*</div> by <p style=\"text-align:*</p> in order to use a supported tag for text-alignment. I have found the following regex expression to find the correct match in my html input: <div style=\"text-align:(.*?)</div> However, I can't find a way to keep the previous text contained in the tags after my replacement. Any clue? Is it me or Regex are generally a PITA? :) private static readonly IDictionary<string, string> _replaceMap = new Dictionary<string, string> { {"<div style=\"text-align:(.*?)</div>", "<p style=\"text-align:(.*?)</p>"} }; public static string FormatHtml(string html) { foreach(var pair in _replaceMap) { html = Regex.Replace(html, pair.Key, pair.Value); } return html; } Thanks!

Read the article

Line Text with Nav Bar in HTML

- by Eric

I have a navbar right under the title to the site, but I want to be line up the first and last items in the navbar with the beginning and end of the Title. I don't have a live preview, but I attached an image. I can get it to line up in one browser, but when I open it in the other, its off again. Is there an easy way to line the text up so it works for everything? thank you HTML: <body onload="play()"> <div class="heading">UPRISING</div> <div class="container_12"> <div id="topnav" align="center"> <ul id="list-nav"> <li><a href="home.html">HOME</a></li> <li><a href="about.html">ABOUT</a></li> <li><a href="trailer.html">TRAILER</a></li> <li><a href="stills.html">STILLS</a></li> <li><a href="news.html">NEWS</a></li> <li><a href="contact.html">CONTACT</a></li>  </ul> </div>  <div class="trailer"> <img id="imgHolder" /> </div> </div>  CSS: #topnav li { margin-right: 110px; } #topnav li:nth-last-child(1) { margin-right: 0px; }

Read the article

HTML include statement

- by iMaster

I'm just trying to do a simple include statement in HTML. I have no clue why its not working. My file setup is basically index.php in the root and then a file called "includes" with a file "header.html" inside. So here's my code: <!DOCTYPE html> <html lang="en"> <html> <head> <title>Title</title> <link type="text/css" href="style/style.css" media="screen" rel="stylesheet"> <script src="scripts/jquery.js" type="text/javascript"></script> <script src="scripts/code.js" type="text/javascript"></script> <meta http-equiv="content-type" content="text/html; charset=UTF-8" /> </head> <body> <div id="wrapper">  ...blah blah blah </div> </body> </html> I've verified that the file is there so I'm not sure what else the problem could be. Thanks!

Read the article

How to define cell width for 2 HTML tables with different column counts?

- by DaveDev

If I have 2 tables: <table id="Table1"> <tr> <td></td><td></td><td></td> </tr> </table> <table id="Table2"> <tr> <td></td><td></td><td></td><td></td> </tr> </table> The first has 3 columns, the second has 4 columns. How can I define a style to represent both tables when I want Table1's cell width to be 1/3 the width of the full table, and Table2's cells are 1/4 the width of the table?

Read the article

Does XML::LibXML::Reader read html?

- by sid_com

I didn't find anything about parsing html in the XML::LibXML::Reader-documentation. And I tried to parse a html-site and it didn't work. Is my conclusion, that XML::LibXML::Reader doesn't work with html right?

Read the article

PHP Simple_html_dom issue

- by stef

The snippet below loops through some web pages, grabs the html and then looks for table.results and gets the plaintext out of the tags contained in each . $results is ok. Now I'm trying to get the href value of an tag that is found in the second of each . I'd like to include this in the $results array, but I'm not sure how to do this. The third foreach statement gets them but then I need to merge $links with $results. Ideally I'd also get the links in the second foreach statement. Does anyone know how? $i = 0; foreach( $urls as $u ) { $html = file_get_html($u); foreach($html->find('.results tbody tr') as $element) { $result[$i] = $this->extract($element->plaintext); $i++; } foreach($html->find('.results tbody tr a') as $element) { $links[$i] = $element->href; $i++; } } print_r($result); print_r($links); die;

Read the article

Pros and Cons of Java HTML to XML cleaners

- by cjavapro

I am looking to allow HTML emails (and other HTML uploads) without letting in scripts and stuff. I plan to have a white list of safe tags and attributes as well as a whitelist of CSS tags and value regexes (to prevent automatic return receipt). I asked a question: Parse a badly formatted XML document (like an HTML file) I found there are many many ways to do this. Some systems have built in sanitizers (which I don't care so much about). I will post some answers and say Community Wiki. Please post any other options you like and say Community Wiki so they can be voted on. Also any comments or wiki edits on what part of a certain product is better and what is not would be greatly appreciated. This page is a very nice listing page but I get kinda lost http://java-source.net/open-source/html-parsers

Read the article

Parsing log files in a folder in ColdFusion

- by Simon Guo

The problem is there is a folder ./log/ containing the files like: jan2010.xml, feb2010.xml, mar2010.xml, jan2009.xml, feb2009.xml, mar2009.xml ... each xml file would like: <root><record name="bob" spend="20"></record>...(more records)</root> I want to write a piece of ColdFusion code (log.cfm) that simply parsing those xml files. For the front end I would let user to choose a year, then the click submit button. All the content in that year will be show up in separate table by month. Each table shows the total money spent for each person. like: person cost bob 200 mike 300 Total 500 Thanks.

Read the article

python [lxml] - cleaning out html tags

- by sadhu_

from lxml.html.clean import clean_html, Cleaner def clean(text): try: cleaner = Cleaner(scripts=True, embedded=True, meta=True, page_structure=True, links=True, style=True, remove_tags = ['a', 'li', 'td']) print (len(cleaner.clean_html(text))- len(text)) return cleaner.clean_html(text) except: print 'Error in clean_html' print sys.exc_info() return text I put together the above (ugly) code as my initial forays into python land. I'm trying to use lxml cleaner to clean out a couple of html pages, so in the end i am just left with the text and nothing else - but try as i might, the above doesnt appear to work as such, i'm still left with a substial amount of markup (and it doesnt appear to be broken html), and particularly links, which aren't getting removed, despite the args i use in remove_tags and links=True any idea whats going on, perhaps im barking up the wrong tree with lxml ? i thought this was the way to go with html parsing in python?

Read the article

(Python) Extracting Text from Source Code?

- by zhuyxn

Currently have a large webpage whose source code is ~200,000 lines of almost all (if not all) HTML. More specifically, it is a webpage whose content is a few thousand blocks of paragraphs separated by line breaks (though a line break does not specifically mean there is a separation in content) My main objective is to extract text from the source code as if I were copying/pasting the webpage into a text editor. There is another parsing function I would like to use, which originally took in copied/pasted text rather than the source code. To do this, I'm currently using urllib2, and calling .get_text() in Beautiful Soup. The problem is, Beautiful Soup is leaving tremendous amounts of white space in my code, and it is difficult to pass the result into the second "text" parser. I have done quite a bit of research on parsing HTMLs, but I'm frankly not sure how to solve this problem easily. Furthermore, I'm a bit confused on how to use imports like lxml to extract text as if I were to simply copy and paste?

Read the article

Removing HTML from Pidgin conversations

- by George

Hi Everyone I'm using Pidgin 2.5.5 with SIPE for talking to MS Communicator user at work. The MS Communicator was just now upgraded and I'm seeing HTML markup with messages. Are there any plugins to interpret HTML and apply the styles or parse it out ? Thanks -G EDIT1: I'm running this on Windows EDIT2: my convos look like person@address (time) no

Read the article

wget: Turn Off Forced .html Retreival

- by Mike B

When performing a recursive download, I specify a pattern via the -R parameter for wget to reject, but if this file is a HTML file, wget downloads the file regardless of whether or not it matches the pattern. e.g. wget -r -R "*dynamicfile*" example.com still retrieves files such as example.com/dynamicfile1.html Is there a way to prevent this?

Read the article

why isn't the standard for word processing HTML?

- by Jonathan

Why does microsoft office and other office suites have their own formats, why don't all processors use HTML? as it extremely universal. What advantages are they to saving a document as doc/docx over html?

Read the article

Why isn't HTML the standard for word processing?

- by Jonathan

Why does Microsoft office and other office suites have their own formats, why don't all processors use HTML? It extremely universal. What advantages are they to saving a document as doc/docx over html?

Read the article

HTML and PHP simple contact form.

- by user317128

I tried to make a simple contact form via HTML and PHP but the form doesnt seem to submit. it stays on the HTML page and doesnt post to the php form. would love someone to look over the code, thanks in advanced. simple_form.html cdoe <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <title>Simple Feedback Form</title> </head> <body> <form action="send_simpleform.php" method="post"> <p>Your name<br /> <input name="sender_name" type="text" size="30" /></p> <p>Email<br /> <input name="sender_email" type="text" size="30" /></p> <p>Message<br /> <textarea name="message" cols="30" rows="5"></textarea></p> <input name="submit" type="button" value="Send This Form" /> </form> </body> </html> send_simpleform.php code <? if (($_POST[sender_name] == "") || ($_POST[sender_email] == "") || ($_POST[message] == "") { header("Location: simple_form.php"); exit; } $msg = "Email sent from wwwsite\n"; $msg .= "Sender's Name:\t $_POST[senders_name]\n"; $msg .= "Sender's E-mail:\t $_POST[senders_email]\n"; $msg .= "Sender's Message:\t $_POST[message]\n"; $to = "[email protected]"; $subject = "Website feedback message"; $mailheaders = "From: My web site <www.testwebsite.com>\n"; $mailherders .= "Reply to: $_POST[sender_email]\n"; $mail($to, $subject, $msg, $mailheaders); ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <title>Simple Feedback Form Sent</title> </head> <body> <h1>The following email has been sent</h1> <p>Your Name:<br /> <? echo "$_POST[sender_name]"; ?> <p>Your Email Adress:<br /> <? echo "$_POST[sender_email]"; ?> <p>Message:<br /> <? echo "$_POST[message]"; ?> </p> </body> </html>

Read the article

Interpreting Inkscape SVG path coordinates for HTML map

- by tovare

I needed some coordinates for a HTML MAP and tried to use inkskape by opening the image and just draw a path with my polygon coordinates. My document properties are set to 256 x 256 pixels and units: px When opening the svg file i get coordinates which are not immediately apparent. <path style="fill:none;stroke:#000000;stroke-width:1px;stroke-linecap:butt; stroke-linejoin:miter;stroke-opacity:1" d="m 23.864407,126.91525 3.254237, 44.47458 35.79661, 44.47458 71.593216, 19.52542 71.59322, -37.9661 22.77967, -72.67797 L 218.0339, 64 192,49.898305 l -32.54237, 8.677966 -18.44068, -35.79661 1.08474, -17.3559322 -71.593215,0 L 45.559322,34.711864 35. 79661,57.491525 5.4237288, 74.847458 6.5084746,101.9661 23.864407,126.91525 z" id="path2840" /> How can I get coordinates I can use ? The original image The SVG file from inkscape Link to SVG Progress: I tried a tool called InkscapeMap which looks promising and simple, but unfortunately it looks like it didn't work with this particular svn file. Solved! Saving the file as a Plain SVG solved the problem and InkscapeMap worked perfectly. (Btw. saving as an optimized svg caused a parsing error) Update 13.11 Using inkscapeMap 0.6 and Inkscape 0.48 i needed to uncheck relative coordinates in SVG output preferences. Also if you get a C error message, hunt down the polygon with a C in it, and redraw the polygon using the XML editor in inkscape. Update 25.11.2011 I modified the source to improve parsing. http://tovare.com/articles/createhtmlimagemapsusinginkscape/

Search Results

Search found 32104 results on 1285 pages for 'html parsing'.

Page 21/1285 | < Previous Page | 17 18 19 20 21 22 23 24 25 26 27 28 | Next Page >

- by kcoppock

- by fusion

- by justjoe

- by Professor Mustard

- by Lulu

- by infant programmer

- by ganapati hegde

- by Jim P

- by deadalnix

- by matthewpe

- by Eric

- by iMaster

- by DaveDev

- by sid_com

- by stef

- by cjavapro

- by Simon Guo

- by sadhu_

- by zhuyxn

- by George

- by Mike B

- by Jonathan

- by Jonathan

- by user317128

- by tovare

< Previous Page | 17 18 19 20 21 22 23 24 25 26 27 28 | Next Page >