Search Results

Search found 45245 results on 1810 pages for 'html content extraction'.

Page 30/1810 | < Previous Page | 26 27 28 29 30 31 32 33 34 35 36 37 | Next Page >

scraping text from multiple html files into a single csv file

- by Lulu

I have just over 1500 html pages (1.html to 1500.html). I have written a code using Beautiful Soup that extracts most of the data I need but "misses" out some of the data within the table. My Input: e.g file 1500.html My Code: #!/usr/bin/env python import glob import codecs from BeautifulSoup import BeautifulSoup with codecs.open('dump2.csv', "w", encoding="utf-8") as csvfile: for file in glob.glob('*html*'): print 'Processing', file soup = BeautifulSoup(open(file).read()) rows = soup.findAll('tr') for tr in rows: cols = tr.findAll('td') #print >> csvfile,"#".join(col.string for col in cols) #print >> csvfile,"#".join(td.find(text=True)) for col in cols: print >> csvfile, col.string print >> csvfile, "===" print >> csvfile, "***" Output: One CSV file, with 1500 lines of text and columns of data. For some reason my code does not pull out all the required data but "misses" some data, e.g the Address1 and Address 2 data at the start of the table do not come out. I modified the code to put in * and === separators, I then use perl to put into a clean csv file, unfortunately I'm not sure how to work my code to get all the data I'm looking for!

Read the article
html hyperlinks show URL in brackets in Entourage

- by Rafe

I have an email script written in .Net that sends html emails. The email uses normal html hyperlinks to insert a link in the email, like this: <a href="http://www.stackoverflow.com/">StackOverflow</a> The problem is that in Entourage, a hyperlink like this always shows up for me like this: StackOverflow < http://www.stackoverflow.com/ > How can I format the hyperlink in my email so that in Entourage the text "StackOverflow" is the actual hyperlink, and the URL is not displayed after the text? Is there an html meta tag that needs to be set? Do I have to set the content-type somewhere? Or is there a different html syntax on the hyperlink itself that I should use?

Read the article
How can I convert HTML to Textile?

- by Joe Van Dyk

I'm scraping a static html site and moving the content into a database-backed CMS. I'd like to use Textile in the CMS. Is there a tool out there that converts HTML into Textile, so I can scrape the existing site, convert the HTML to Textile, and insert that data into the database?

Read the article
Convert html to aspx.

- by vinod

Hi is there any tool or code to convert html files to .aspx ? Elaboration to earlier question: I am looking for tool or code that automatically converts html controls to .aspx server control without having to manually change each control. i.e something that will take html page as input, parses it and outputs the controls for .aspx page. thanks

Read the article
How can one prevent double encoding of html entities when they are allowed in the input

- by Bob

How can I prevent double encoding of html entities, or fix them programmatically? I am using the encode() function from the HTML::Entities perl module to encode HTML entities in user input. The problem here is that we also allow users to input HTML entities directly and these entities end up being double encoded. For example, a user may enter: Stackoverflow & Perl = Awesome… This ends up being encoded to Stackoverflow & Perl = Awesome&hellip; This renders in the browser as Stackoverflow & Perl = Awesome… We want this to render as Stackoverflow & Perl = Awesome... Is there a way to prevent this double encoding? Or is there a module or snippet of code that can easily correct these double encoding issues? Any help is greatly appreciated!

Read the article
Is it possible to email the contents of vim using HTML

- by brianegge

I like to view the current differences in the source files I'm working on with a command like: vim <(svn diff -dub) What I'd really like to be able to do is to email that colorized diff. I know vim can export HTML with the :TOhtml, but how do I pipeline this output into an html email? Ideally. i'd like to be able to send an html diff with a single shell script command.

Read the article
html truncator in java

- by sammichy

Is there any utility (or sample source code) that truncates HTML (for preview) in Java? I want to do the truncation on the server and not on the client. I'm using HTMLUnit to parse HTML. UPDATE: I want to be able to preview the HTML, so the truncator would maintain the structure while stripping out the elements after the desired output length.

Read the article
how to convert html to csv ?

- by wefwgeweg

is there a html parser or some library that automatically converts html page into csv data rows ?

Read the article
Is it better to have client (Javascript) processing HTML rather than C# processing HTML?

- by Raja

We are in the process of building a huge site. We are contemplating on whether to do the processing of HTML at server side (ASP .Net) or at the client side. For example we have HTML files which acts like templates for the generation of tabs. Is it better for the server side to get hold of content section (div) of HTML load the appropriate values and send the updated HTML to the browser or is it better that a chunk of data is passed onto client and make Javascript do the work? Any justification with respect to either ways will be helpful. Thanks.

Read the article
Why is Swing Parser's handleText not handling nested tags?

- by Jim P

I need to transform some HTML text that has nested tags to decorate 'matches' with a css attribute to highlight it (like firefox search). I can't just do a simple replace (think if user searched for "img" for example), so I'm trying to just do the replace within the body text (not on tag attributes). I have a pretty straightforward HTML parser that I think should do this: final Pattern pat = Pattern.compile(srch, Pattern.CASE_INSENSITIVE); Matcher m = pat.matcher(output); if (m.find()) { final StringBuffer ret = new StringBuffer(output.length()+100); lastPos=0; try { new ParserDelegator().parse(new StringReader(output.toString()), new HTMLEditorKit.ParserCallback () { public void handleText(char[] data, int pos) { ret.append(output.subSequence(lastPos, pos)); Matcher m = pat.matcher(new String(data)); ret.append(m.replaceAll("<span class=\"search\">$0</span>")); lastPos=pos+data.length; } }, false); ret.append(output.subSequence(lastPos, output.length())); return ret; } catch (Exception e) { return output; } } return output; My problem is, when I debug this, the handleText is getting called with text that includes tags! It's like it's only going one level deep. Anyone know why? Is there some simple thing I need to do to HTMLParser (haven't used it much) to enable 'proper' behavior of nested tags? PS - I figured it out myself - see answer below. Short answer is, it works fine if you pass it HTML, not pre-escaped HTML. Doh! Hope this helps someone else. <span>example with <a href="#">nested</a> <p>more nesting</p> </span> 

Read the article
How to send HTML email with styles in rails

- by Salil

Hi All, I want to know how we can send webpage (HTML page) as an email. I want to add style in a html page like Images or some table formatting. I want to know that should i add style in my "conta.text.html.erb" or i can add css in it?

Read the article
Html.BeginForm() not rendering properly

- by Taskos George

While searching in stackoverflow the other questions didn't exactly helped in my situation. How it would be possible to debug such an error like the one that the Html.BeginForm does not properly rendered to the page. I use this code @model ExtremeProduction.Models.SelectUserGroupsViewModel @{ ViewBag.Title = "User Groups"; } <h2>Groups for user @Html.DisplayFor(model => model.UserName)</h2> <hr /> @using (Html.BeginForm("UserGroups", "Account", FormMethod.Post, new { encType = "multipart/form-data", id = "userGroupsForm" })) { @Html.AntiForgeryToken() <div class="form-horizontal"> @Html.ValidationSummary(true) <div class="form-group"> <div class="col-md-10"> @Html.HiddenFor(model => model.UserName) </div> </div> <h4>Select Group Assignments</h4> <br /> <hr /> <table> <tr> <th> Select </th> <th> Group </th> </tr> @Html.EditorFor(model => model.Groups) </table> <br /> <hr /> <div class="form-group"> <div class="col-md-offset-2 col-md-10"> <input type="submit" value="Save" class="btn btn-default" /> </div> </div> </div> } <div> @Html.ActionLink("Back to List", "Index") </div> EDIT: Added the Model // Wrapper for SelectGroupEditorViewModel to select user group membership: public class SelectUserGroupsViewModel { public string UserName { get; set; } public string FirstName { get; set; } public string LastName { get; set; } public List<SelectGroupEditorViewModel> Groups { get; set; } public SelectUserGroupsViewModel() { this.Groups = new List<SelectGroupEditorViewModel>(); } public SelectUserGroupsViewModel(ApplicationUser user) : this() { this.UserName = user.UserName; this.FirstName = user.FirstName; this.LastName = user.LastName; var Db = new ApplicationDbContext(); // Add all available groups to the public list: var allGroups = Db.Groups; foreach (var role in allGroups) { // An EditorViewModel will be used by Editor Template: var rvm = new SelectGroupEditorViewModel(role); this.Groups.Add(rvm); } // Set the Selected property to true where user is already a member: foreach (var group in user.Groups) { var checkUserRole = this.Groups.Find(r => r.GroupName == group.Group.Name); checkUserRole.Selected = true; } } } // Used to display a single role group with a checkbox, within a list structure: public class SelectGroupEditorViewModel { public SelectGroupEditorViewModel() { } public SelectGroupEditorViewModel(Group group) { this.GroupName = group.Name; this.GroupId = group.Id; } public bool Selected { get; set; } [Required] public int GroupId { get; set; } public string GroupName { get; set; } } public class Group { public Group() { } public Group(string name) : this() { Roles = new List<ApplicationRoleGroup>(); Name = name; } [Key] [Required] public virtual int Id { get; set; } public virtual string Name { get; set; } public virtual ICollection<ApplicationRoleGroup> Roles { get; set; } } ** EDIT ** And I get this form http://i834.photobucket.com/albums/zz268/gtas/formmine_zpsf6470e02.png I should receive a form like the one that I copied the code like this http://i834.photobucket.com/albums/zz268/gtas/formcopied_zpsdb2f129e.png Any ideas where or how to look the source of evil that makes my life hard for some time now?

Read the article
Templates vs. coded HTML

- by Alan Harris-Reid

I have a web-app consisting of some html forms for maintaining some tables (SQlite, with CherryPy for web-server stuff). First I did it entirely 'the Python way', and generated html strings via. code, with common headers, footers, etc. defined as functions in a separate module. I also like the idea of templates, so I tried Jinja2, which I find quite developer-friendly. In the beginning I thought templates were the way to go, but that was when pages were simple. Once .css and .js files were introduced (not necessarily in the same folder as the .html files), and an ever-increasing number of {{...}} variables and {%...%} commands were introduced, things started getting messy at design-time, even though they looked great at run-time. Things got even more difficult when I needed additional javascript in the or sections. As far as I can see, the main advantages of using templates are: Non-dynamic elements of page can easily be viewed in browser during design. Except for {} placeholders, html is kept separate from python code. If your company has a web-page designer, they can still design without knowing Python. while some disadvantages are: {{}} delimiters visible when viewed at design-time in browser Associated .css and .js files have to be in same folder to see effects in browser at design-time. Data, variables, lists, etc., must be prepared in advanced and either declared globally or passed as parameters to render() function. So - when to use 'hard-coded' HTML, and when to use templates? I am not sure of the best way to go, so I would be interested to hear other developers' views. TIA, Alan

Read the article
Flash receives mouse events under an HTML element when opacity set

- by Török Gábor

I have an HTML document with a Flash object and an absolutely positioned HTML element above it. If I set the HTML element's opacity CSS property to any value less than 1, the Flash object (that is actually covered) receives mouse events. This problem cannot be reproduced with pure HTML elements. Furthermore, Flash only receives hover events, so I cannot click below the layer. I put a demonstration of the problem online. I get this behavior in Firefox 3.6, Safari 4.0 and Chrome 5.0 in both Mac and Windows. Flash plugin version 10 is installed. Is it a bug or the the normal and expected behavior? If the latter, then how can I prevent Flash receiving events when it is covered with a translucent layer?

Read the article
Converting a HTML string to normal text using iphone dev

- by user315252

how to convert a HTML string to a plain text in iphone dev. Is there any in built functions which does it ? I am actually downloading a html string from web and showing it in the UIwebview and i am giving the user a edit option so that the html needs to converted to tet so that user can edit Thanks...

Read the article
Edit Html.ActionLink output string

- by Aaron Salazar

I'm trying to output the following HTML using Html.ActionLink: <a href="/About" class="read-more">Read More<span class="arrow">?</span></a> I'm getting it done by doing an ActionLink, which outputs an tag and then manipulating the string. <%= Html.ActionLink("[[replace]]", "Index", "About", null, new { @class = "read-more" }).ToHtmlString().Replace("[[replace]]", "Read More" + "<span class='arrow'>?</span>")%></p> It'd be good if I could put HTML directly into the ActionLink but there doesn't seem to be a way based on my internet searches. Sure, it works but it seems like a hack. Is there a better way to accomplish this?

Read the article
Can one prevent Genshi from parsing HTML entities?

- by DNS

I have the following Python code using Genshi (simplified): with open(pathToHTMLFile, 'r') as f: template = MarkupTemplate(f.read()) finalPage = template.generate().render('html', doctype = 'html') The source HTML file contains entities such as ©, ™ and ®. Genshi replaces these with their UTF-8 character, which causes problems with the viewer (the output is used as a stand-alone file, not a response to a web request) that eventually sees the resulting HTML. Is there any way to prevent Genshi from parsing these entities? The more common ones like & are passed through just fine.

Read the article
Displaying html in a table view cell

- by Surya

I am working a rss reader app for iphone . What are my options for displaying entry summary in rss feed ( which could be html) in a tableviewcell without compromising scroll performance . I dont control the feed so html in summary section is out of my control . I am thinking of uiwebview would be my last option ( so rss feeds have images and stuff in there, unfortunately ) . I was thinking if there was a way to extract summary text from html.

Read the article
extract part of html using tidy in php

- by hunt

hi, I want to extract particular part of HTML using tidy in php. the html page has table in it and i just want to fetch that table from html page. please help and post the solution.... Thanks

Read the article
Converting HTML to plain text in PHP for e-mail

- by jstayton

I use TinyMCE to allow minimal formatting of text within my site. From the HTML that's produced, I'd like to convert it to plain text for e-mail. I've been using a class called html2text, but it's really lacking in UTF-8 support, among other things. I do, however, like that it maps certain HTML tags to plain text formatting — like putting underscores around text that previously had <i> tags in the HTML. Does anyone use a similar approach to converting HTML to plain text in PHP? And if so: Do you recommend any third-party classes that I can use? Or how do you best tackle this issue? Thanks!

Read the article
MySQL table export to HTML

- by countnazgul

Hi all, I've got a little problem with exporting MySQL data to html. The problem is that in one field i have values like this: <a href="http://google.com">Google</a> and when i export the table in html format the generated html table for this fields contains: <a href="http://google.com">Google</a> which is not a valid html link. Is there way to export the table without mysql to convert the < and > chars? Thanks!

Read the article
Convert doc/docx to semantic HTML

- by sandstrom

I would like to convert doc/docx documents to semantic HTML. Some wishes/requirements: Semantic HTML such that headers in the document are <h1>, <h2> etc., tables are <table> and so forth. Should preferably be possible to handle headings, lists, tables and images. Graphs and math formulas is a nice extra. • Doesn't have to be converted straight from doc/docx to html, could use an intermediary format, such as xml or docbook. • Should work programatically, and with large number of documents. The closest thing to a solution I've found so far is http://holloway.co.nz/docvert/index.html, but unfortunately there are many a few bugs, small user base and it can't handle a lot of documents. More of a proof of concept.

Read the article
How to output formatted HTML from PHP?

- by Tim

I like to format all my HTML with tabs for neatness and readability. Recently I started using PHP and now I have a lot of HTML output that comes from in between PHP tags. Those output lines all line up one the left side of the screen. I have to use /n to make a line go to the next. Is there anything like that for forcing tabs, or any way to have neat HTML output coming from PHP?

Read the article
JQuery .html() method and external scripts

- by Marco

Hi, i'm loading, using the JQuery ajax() method, an external page with both html and javascript code: <script type="text/javascript" src="myfile.js"></script> <p>This is some HTML</p> <script type="text/javascript"> alert("This is inline JS"); </script> and setting the results into a div element, using the html() method. While the html() method properly evaluates the inline JS code, it doesn't download and evaluate the external JS file "myfile.js". Any tip for this issue?

Read the article
Cleaning all inline events from HTML tags

- by Itay Moav

For HTML input, I want to neutralize all HTML elements that have inline js (onclick="..", onmouseout=".." etc). I am thinking, isn't it enough to encode the following chars? =,(,) So onclick="location.href='ggg.com'" will become onclick%3D"location.href%3D'ggg.com'" What am I missing here? Edit: I do need to accept active HTML (I can't escape it all or entities is it).

Read the article

< Previous Page | 26 27 28 29 30 31 32 33 34 35 36 37 | Next Page >