htmldocument - Page 2 - Developer IT

Is there an object in C# that allows for easy management of HTML DOM?

- by Matt

Hi, If I have a string that contains the html from a page I just got returned from an HTTP Post, how can I turn that into something that will let me easily traverse the DOM? I figured HtmlDocument object would make sense, but it has no constructor. Are there any types that allow for easy management of HTML DOM? Thanks, Matt

Read the article

How to fill out a form field without a name in webbrowser control?

- by ajl

In the past, I used the code below to fill out a form field using the webbrowser control in VB.Net. The page I am working with doesn't have name field for the inputbox, so my code doesn't work. How would I fill out the input box defined at the bottom of this post in bold? Dim iPage As HtmlDocument iPage = wb1.Document iPage.All.Item("case_num").InnerText() = caseNum iPage.All.Item("button1").InvokeMember("click") **<input type="text" id="tbSymbolLookupMain" mode="mixed" autocomplete="off" defaulttxt="Enter Name or Symbol(s)" value="Enter Name or Symbol(s)" class="SymbolLookup fhHandleFocus fhDefault">**

Read the article

Can this be imporved? Scrubing of dangerous html tags.

- by chobo2

Hi I been finding that for something that I consider pretty import there is very little information or libraries on how to deal with this problem. I found this while searching. I really don't know all the million ways that a hacker could try to insert the dangerous tags. I have a rich html editor so I need to keep non dangerous tags but strip out bad ones. So is this script missing anything? It uses html agility pack. public string ScrubHTML(string html) { HtmlDocument doc = new HtmlDocument(); doc.LoadHtml(html); //Remove potentially harmful elements HtmlNodeCollection nc = doc.DocumentNode.SelectNodes("//script|//link|//iframe|//frameset|//frame|//applet|//object|//embed"); if (nc != null) { foreach (HtmlNode node in nc) { node.ParentNode.RemoveChild(node, false); } } //remove hrefs to java/j/vbscript URLs nc = doc.DocumentNode.SelectNodes("//a[starts-with(translate(@href, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), 'javascript')]|//a[starts-with(translate(@href, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), 'jscript')]|//a[starts-with(translate(@href, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), 'vbscript')]"); if (nc != null) { foreach (HtmlNode node in nc) { node.SetAttributeValue("href", "#"); } } //remove img with refs to java/j/vbscript URLs nc = doc.DocumentNode.SelectNodes("//img[starts-with(translate(@src, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), 'javascript')]|//img[starts-with(translate(@src, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), 'jscript')]|//img[starts-with(translate(@src, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), 'vbscript')]"); if (nc != null) { foreach (HtmlNode node in nc) { node.SetAttributeValue("src", "#"); } } //remove on<Event> handlers from all tags nc = doc.DocumentNode.SelectNodes("//*[@onclick or @onmouseover or @onfocus or @onblur or @onmouseout or @ondoubleclick or @onload or @onunload]"); if (nc != null) { foreach (HtmlNode node in nc) { node.Attributes.Remove("onFocus"); node.Attributes.Remove("onBlur"); node.Attributes.Remove("onClick"); node.Attributes.Remove("onMouseOver"); node.Attributes.Remove("onMouseOut"); node.Attributes.Remove("onDoubleClick"); node.Attributes.Remove("onLoad"); node.Attributes.Remove("onUnload"); } } // remove any style attributes that contain the word expression (IE evaluates this as script) nc = doc.DocumentNode.SelectNodes("//*[contains(translate(@style, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), 'expression')]"); if (nc != null) { foreach (HtmlNode node in nc) { node.Attributes.Remove("stYle"); } } return doc.DocumentNode.WriteTo(); }

Read the article

Can this be improved? Scrubing of dangerous html tags.

- by chobo2

I been finding that for something that I consider pretty import there is very little information or libraries on how to deal with this problem. I found this while searching. I really don't know all the million ways that a hacker could try to insert the dangerous tags. I have a rich html editor so I need to keep non dangerous tags but strip out bad ones. So is this script missing anything? It uses html agility pack. public string ScrubHTML(string html) { HtmlDocument doc = new HtmlDocument(); doc.LoadHtml(html); //Remove potentially harmful elements HtmlNodeCollection nc = doc.DocumentNode.SelectNodes("//script|//link|//iframe|//frameset|//frame|//applet|//object|//embed"); if (nc != null) { foreach (HtmlNode node in nc) { node.ParentNode.RemoveChild(node, false); } } //remove hrefs to java/j/vbscript URLs nc = doc.DocumentNode.SelectNodes("//a[starts-with(translate(@href, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), 'javascript')]|//a[starts-with(translate(@href, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), 'jscript')]|//a[starts-with(translate(@href, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), 'vbscript')]"); if (nc != null) { foreach (HtmlNode node in nc) { node.SetAttributeValue("href", "#"); } } //remove img with refs to java/j/vbscript URLs nc = doc.DocumentNode.SelectNodes("//img[starts-with(translate(@src, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), 'javascript')]|//img[starts-with(translate(@src, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), 'jscript')]|//img[starts-with(translate(@src, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), 'vbscript')]"); if (nc != null) { foreach (HtmlNode node in nc) { node.SetAttributeValue("src", "#"); } } //remove on<Event> handlers from all tags nc = doc.DocumentNode.SelectNodes("//*[@onclick or @onmouseover or @onfocus or @onblur or @onmouseout or @ondoubleclick or @onload or @onunload]"); if (nc != null) { foreach (HtmlNode node in nc) { node.Attributes.Remove("onFocus"); node.Attributes.Remove("onBlur"); node.Attributes.Remove("onClick"); node.Attributes.Remove("onMouseOver"); node.Attributes.Remove("onMouseOut"); node.Attributes.Remove("onDoubleClick"); node.Attributes.Remove("onLoad"); node.Attributes.Remove("onUnload"); } } // remove any style attributes that contain the word expression (IE evaluates this as script) nc = doc.DocumentNode.SelectNodes("//*[contains(translate(@style, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), 'expression')]"); if (nc != null) { foreach (HtmlNode node in nc) { node.Attributes.Remove("stYle"); } } return doc.DocumentNode.WriteTo(); }

Read the article

VB6: assign javascript function to a dom element

- by Fuxi

hi, i'm using the mshtml library for parsing out html via MSHTML.HTMLDocument. my question: is there a way to assign a javascript function to a dom element? i've tried something like: div.onmouseover = "function(){alert('mouseover')}" and div.setattribute "onmouseover" , "function(){alert('mouseover')}" without success (no error but no effect either). anyone knows if its possible? thx

Read the article

Can this be improved? Scrubbing of dangerous html tags.

- by chobo2

I been finding that for something that I consider pretty import there is very little information or libraries on how to deal with this problem. I found this while searching. I really don't know all the million ways that a hacker could try to insert the dangerous tags. I have a rich html editor so I need to keep non dangerous tags but strip out bad ones. So is this script missing anything? It uses html agility pack. public string ScrubHTML(string html) { HtmlDocument doc = new HtmlDocument(); doc.LoadHtml(html); //Remove potentially harmful elements HtmlNodeCollection nc = doc.DocumentNode.SelectNodes("//script|//link|//iframe|//frameset|//frame|//applet|//object|//embed"); if (nc != null) { foreach (HtmlNode node in nc) { node.ParentNode.RemoveChild(node, false); } } //remove hrefs to java/j/vbscript URLs nc = doc.DocumentNode.SelectNodes("//a[starts-with(translate(@href, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), 'javascript')]|//a[starts-with(translate(@href, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), 'jscript')]|//a[starts-with(translate(@href, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), 'vbscript')]"); if (nc != null) { foreach (HtmlNode node in nc) { node.SetAttributeValue("href", "#"); } } //remove img with refs to java/j/vbscript URLs nc = doc.DocumentNode.SelectNodes("//img[starts-with(translate(@src, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), 'javascript')]|//img[starts-with(translate(@src, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), 'jscript')]|//img[starts-with(translate(@src, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), 'vbscript')]"); if (nc != null) { foreach (HtmlNode node in nc) { node.SetAttributeValue("src", "#"); } } //remove on<Event> handlers from all tags nc = doc.DocumentNode.SelectNodes("//*[@onclick or @onmouseover or @onfocus or @onblur or @onmouseout or @ondoubleclick or @onload or @onunload]"); if (nc != null) { foreach (HtmlNode node in nc) { node.Attributes.Remove("onFocus"); node.Attributes.Remove("onBlur"); node.Attributes.Remove("onClick"); node.Attributes.Remove("onMouseOver"); node.Attributes.Remove("onMouseOut"); node.Attributes.Remove("onDoubleClick"); node.Attributes.Remove("onLoad"); node.Attributes.Remove("onUnload"); } } // remove any style attributes that contain the word expression (IE evaluates this as script) nc = doc.DocumentNode.SelectNodes("//*[contains(translate(@style, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), 'expression')]"); if (nc != null) { foreach (HtmlNode node in nc) { node.Attributes.Remove("stYle"); } } return doc.DocumentNode.WriteTo(); } Edit 2 people have suggested whitelisting. I actually like the idea of whitelisting but never actually did it because no one can actually tell me how to do it in C# and I can't even really find tutorials for how to do it in c#(the last time I looked. I will check it out again). How do you make a white list? Is it just a list collection? How do you actual parse out all html tags, script tags and every other tag? Once you have the tags how do you determine which ones are allowed? Compare them to you list collection? But what happens if the content is coming in and has like 100 tags and you have 50 allowed. You got to compare each of those 100 tag by 50 allowed tags. Thats quite a bit to go through and could be slow. Once you found a invalid tag how do you remove it? I don't really want to reject a whole set of text if one tag was found to be invalid. I rather remove and insert the rest. Should I be using html agility pack?

Read the article

page posting issue when working in Screen Scraping

- by Muhammad Akhtar

Hi, I am working on screen scraping and done successfully in 3 websites, I have an issue in last website here is my url, When I hit with my parameter, it is showing result on next page, simply posting to other page and showing the result fine on other page Here is My Test However, when I hit from my application, since here I don't have an option to post, it only fetch html of requested page that is obviously my above mention HTML test link, that actually have parameter in URL to get the result. How can I handle this situtation? Please give me hint. Thanks here is my C# code, I am using HTMLAgality String url; HtmlWeb hw = new HtmlWeb(); HtmlDocument doc; url = "http://mysampleURL"; doc = hw.Load(url);

Read the article

A way around XSS?

- by rushonerok

In my page I have an script reference to the autoHeight.js file below. I also have an iframe that I want to resize using this code. In firebug I get this error Error: Permission denied for <http://www.siena.edu> to get property HTMLDocument.body from <https://siteframework.siena.edu>. Source File: https://siteframework.siena.edu/FormManager/action/v2/autoHeight.js Line: 4 I am assuming it is because they are on different domains that it won't execute it. This is to prevent XSS right? Is there a way around it?

Read the article

Web browser.navigate("www.somesite.com") Load page in window but Webbrowser.Document returns Null

- by Waseem

Hi.. I am using Web browser control in a window form. Here i am navigating to some site with 1 parameter. It is loading the page into web browser but when i am looking for webbrowser.document to find some html tags so it is showing NULL for it. I want to find out All Anchor tags in webbrowse Loaded page. Following is my code. webChatPage.Navigate(ConfigurationManager.AppSettings["ServerURL"].ToString() + "/somepage.php?someparameter=" + sessionId); HtmlDocument hDoc = webChatPage.Document; //hDoc = NULL in debugging HtmlElementCollection aTag = hDoc.Links; MessageBox.Show(aTag.Count.ToString()); If there is any solution then help me out.

Read the article

Parsing tables, cells with Html agility in C#

- by Kaeso

I need to parse Html code. More specifically, parse each cell of every rows in all tables. Each row represent a single object and each cell represent different properties. I want to parse these to be able to write an XML file with every data inside (without the useless HTML code). This is the way I thought it out initially but I ran out of ideas: HTML: <tr> <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF"> 1 </td> <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="left"> <a href="/ice/player.htm?id=8471675">Sidney Crosby</a> </td> <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="center"> PIT </td> <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="center"> C </td> <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right"> 39 </td> <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right"> 32 </td> <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right"> 33 </td> <td class="statBox sorted" style="border-width:0px 1px 1px 0px; background-color: #E0E0E0" align="right"> <font color="#000000"> 65 </font> </td> <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right"> 20 </td> <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right"> 29 </td> <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right"> 10 </td> <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right"> 1 </td> <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right"> 3 </td> <td class="statBox" style="border-width:0px 0px 1px 0px; background-color: #FFFFFF" align="right"> </td> <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right"> 0 </td> <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right"> 154 </td> <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right"> 20.8 </td> <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right"> 21:54 </td> <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right"> 22.6 </td> <td class="statBox" style="border-width:0px 0px 1px 0px; background-color: #FFFFFF" align="right"> 55.7 </td> </tr> C#: using HtmlAgilityPack; using System.Data; namespace Stats { class StatsParser { private string htmlCode; private static string fileName = "[" + DateTime.Now.ToShortDateString() + " NHL Stats].xml"; public StatsParser(string htmlCode) { this.htmlCode = htmlCode; this.ParseHtml(); } public DataTable ParseHtml() { var result = new DataTable(); HtmlDocument doc = new HtmlDocument(); doc.LoadHtml(htmlCode); HtmlNode row = doc.DocumentNode.SelectNodes("//tr"); foreach (var statBox in row.SelectNodes("//td[@class='statBox']")) { System.Windows.MessageBox.Show(statBox.InnerText); } } } }

Read the article

Select only items in a specific DIV using HtmlAgilityPack

- by Adam Haile

I'm trying to use the HtmlAgilityPack to pull all of the links from a page that are contained within a div declared as <div class='content'> However, when I use the code below I simply get ALL links on the entire page. This doesn't really make sense to me since I am calling SelectNodes from the sub-node I selected earlier (which when viewed in the debugger only shows the HTML from that specific div). So, it's like it's going back to the very root node every time I call SelectNodes. The code I use is below: HtmlWeb hw = new HtmlWeb(); HtmlDocument doc = hw.Load(@"http://example.com"); HtmlNode node = doc.DocumentNode.SelectSingleNode("//div[@class='content']"); foreach(HtmlNode link in node.SelectNodes("//a[@href]")) { Console.WriteLine(link.Value); } Is this the expected behavior? And if so, how do I get it to do what I'm expecting?

Read the article

JQuery - Form Submission

- by user70192

Hello, I have a web page that has a DIV and an IFRAME. When a user clicks a button in my DIV, I need to submit the web form that is hosted in my IFRAME. Both pages are hosted on the same server in the same directory. My code that attempts to submit the web form looks like this: $("#myIFrame").contents().find("form").submit(); When this code is called, I receive an error in IE that says: "Internet Explorer cannot display the webpage" In FireFox when I execute the code, I receive an error that says: "The connection was reset" In FireFox when I look at the error console I see: "Error: Permission denied for http://localhost:2995 to get property HTMLDocument.nodeType from ." What am I doing wrong?

Read the article

How to convert an HTML table to an array in python

- by user345660

I have an html document, and I want to pull the tables out of this document and return them as arrays. I'm picturing 2 functions, one that finds all the html tables in a document, and a second one that turns html tables into 2-dimensional arrays. Something like this: htmltables = get_tables(htmldocument) for table in htmltables: array=make_array(table) There's 2 catches: 1. The number tables varies day to day 2. The tables have all kinds of weird extra formatting, like bold and blink tags, randomly thrown in. Thanks!

Read the article

Extraction Event

- by Anicho

So I have the following code: public override void Extract(object sender, ExtractionEventArgs e) { if (e.Response.HtmlDocument != null) { var myParam = e.Request.QueryStringParameters.Where(parameter => parameter.Name == QueryName).Select(parameter => parameter.Value).Distinct(); myParam. // add the extracted value to the web performance test context e.WebTest.Context.Add(this.ContextParameterName, myParam.ToString()); e.Success = true; return; } // If the extraction fails, set the error text that the user sees e.Success = false; e.Message = String.Format(CultureInfo.CurrentCulture, "Not Found: {0}", QueryName); } It's returning: System.Linq.Enumerable+<DistinctItem>d_81`1[system.string] I am expecting something along the lines of: 0152-1231-1231-123d My question is how do I extract the querystring's actual value from extractioneventargs. They say it's possible, but I have no idea.

Read the article

How to return radio checked object with jQuery ?

- by Kim

HTML <input type="radio" name="rdName" id="uniqueID1" value="1" checked="checked"> <input type="radio" name="rdName" id="uniqueID2" value="2"> <input type="radio" name="rdName" id="uniqueID3" value="3"> jQuery #1 $('input:radio[name=rdName]:checked').val(); jQuery #2 $('input[name=rdName]:checked'); jQuery #1 gets the value of checked radio, but I need to get the whole object to get the ID. jQuery #2 get this (from Chrome Dev console). Object "0" is the actual object I need, but I am unable to just that. 0: HTMLInputElement constructor: function Object() context: HTMLDocument length: 1 prevObject: Object selector: input[name=rdName]:checked __proto__: Object Any ideas how to isolate the needed object ?

Read the article

Get entire text of document as a string using javascript

- by Tom Dignan

I am developing a firefox extension and ideally would be able to get the whole darn DOM as a string.. forget any data structure. I just want what I see in "view source" in a buffer. I have been checking out javascript references and HTMLDocument etc. with no avail. Ideally I would be able to write to this buffer as well (seems possible i.e. document.writeLn()) I wish there was a document.read()? Am I just a js noob?

Read the article

Reading cookies across different hosts

- by Thinker

I have two sites - both are my projects. On site two, I need to check if the user is logged in on site one. I suppose to do this I should just create a script that puts a cookie into the body of an iframe and then read the iframe contents on site two. But I can't. Here is a code I made for testing purposes: http://jsbin.com/oqaza/edit I got an error, that says: "Permission denied for <http://jsbin.com to get property HTMLDocument.nodeType from <http://www.google.com."

Read the article

SQL Server 2008 full-text search doesn't find word in words?

- by Martijn

In the database I have a field with a .mht file. I want to use FTS to search in this document. I got this working, but I'm not satisfied with the result. For example (sorry it's in dutch, but I think you get my point) I will use 2 words: zieken and ziekenhuis. As you can see, the phrase 'zieken' is in the word 'ziekenhuis'. When I search on 'ziekenhuis' I get about 20 results. When I search on 'zieken' I get 7 results. How is this possible? I mean, why doesn't the FTS resturn the minimal results which I get from 'ziekenhuis'? Here's the query I use: SELECT DISTINCT d.DocID 'Id', d.Titel, (SELECT afbeeldinglokatie FROM tbl_Afbeelding WHERE soort = 'beleid') as Pic, 'belDoc' as DocType FROM docs d JOIN kpl_Document_Lokatie dl ON d.DocID = dl.DocID JOIN HandboekLokaties hb ON dl.LokatieID = hb.LokatieID WHERE hb.InstellingID = @instellingId AND ( FREETEXT(d.Doel, @searchstring) OR FREETEXT(d.Toepassingsgebied, @searchstring) OR FREETEXT(d.HtmlDocument, @searchstring) OR FREETEXT (d.extraTabblad, @searchstring) ) AND d.StatusID NOT IN( 1, 5)

Read the article

Modify HTML in a Internet Explorer window using external.menuArguments

- by Axeman

Hi all... I've a vb.net class that is invoked with a context menu extension in Internet Explorer. The code has access to the object model of the page, and reading data is not a problem. This is the code of a test funcion... it changes the status bar text (OK), prints the page html (OK), changes the html by adding a text and prints again the page html (ok, in the second popup my added text is in the html) But the Internet Explorer window doesnt't show it. Where am I doing wrong? Public Sub CallingTest(ByRef Source As Object) Dim D As mshtml.HTMLDocument = Source.document Source.status = "Working..." Dim H As String = D.documentElement.innerHTML() MsgBox(H) D.documentElement.insertAdjacentText("beforeEnd", "ThisIsATest") H = D.documentElement.outerHTML() MsgBox(H) Source.status = "" End Sub Function is called by this javascript: <SCRIPT> var EB = new ActiveXObject("MyObject.MyClass"); EB.CallingTest(external.menuArguments); </SCRIPT>

Read the article

How can i return List of directories instead of url's?

- by user1741587

I have this function : private List<string> getLinks(HtmlAgilityPack.HtmlDocument document) { List<string> mainLinks = new List<string>(); var linkNodes = document.DocumentNode.SelectNodes("//a[@href]"); if (linkNodes != null) { foreach (HtmlNode link in linkNodes) { var href = link.Attributes["href"].Value; if (href.StartsWith("http://") == true || href.StartsWith("https://") == true || href.StartsWith("www") == true) // filter for http { mainLinks.Add(href); } } } return mainLinks; } Its getting one url and return list of url's. Instead i want that the function will get a directory for example c:\ And it will return me a List of all directories in c:\ Not subsirectories just the directories in c:\ in my case it should be a List with a 14 directories. Meaning in each index in the List a directory. How can i do it ? Tried with Directory and DirectoryInfo but i just got messed up.

Read the article

Difference between the Document classes

- by takoi

I've been reading the javadocs trying to grasp around the swing Document API but I cant get something sensible out of it because there's so many classes: Document, StyledDocument, AbstractDocument, DefaultStyledDocument, PlainDocument, HTMLDocument, and someone mentioned DocumentFilter. This question is more on a general basis so can someone give an overview of the differences between the implementations and when the different interfaces and abstracts are for? For my specific case what I want to achieve is a data structure that will: hold three lines of text only. And attributes must not be per line or document. I will have a couple of thousand of these in some other structure so overhead is important. Anything that i can use for this or is it better to extend something? If so, what?

Read the article

.NET C#: WebBrowser control Navigate() does not load targeted URL

- by Dave

Hey guys, I'm trying to programmatically load a web page via the WebBrowser control with the intent of testing the page & it's JavaScript functions. Basically, I want to compare the HTML & JavaScript run through this control against a known output to ascertain whether there is a problem. However, I'm having trouble simply creating and navigating the WebBrowser control. The code below is intended to load the HtmlDocument into the WebBrowser.Document property: WebBrowser wb = new WebBrowser(); wb.AllowNavigation = true; wb.Navigate("http://www.google.com/"); When examining the web browser's state via Intellisense after Navigate() runs, the WebBrowser.ReadyState is 'Uninitialized', WebBrowser.Document = null, and it overall appears completely unaffected by my call. On a contextual note, I'm running this control outside of a Windows form object: I do not need to load a window or actually look at the page. Requirements dictate the need to simply execute the page's JavaScript and examine the resultant HTML. Any suggestions are greatly appreciated, thanks!

Read the article

HTML Agility Pack Screen Scraping XPATH isn't returning data

- by Matthias Welsh

I'm attempting to write a screen scraper for Digikey that will allow our company to keep accurate track of pricing, part availability and product replacements when a part is discontinued. There seems to be a discrepancy between the XPATH that I'm seeing in Chrome Devtools as well as Firebug on Firefox and what my C# program is seeing. The code I'm currently using is pretty quick and dirty... //This function retrieves data from the digikey private static List<string> ExtractProductInfo(HtmlDocument doc) { List<HtmlNode> m_unparsedProductInfoNodes = new List<HtmlNode>(); List<string> m_unparsedProductInfo = new List<string>(); //Base Node for part info string m_baseNode = @"//html[1]/body[1]/div[2]"; //Write part info to list m_unparsedProductInfoNodes.Add(doc.DocumentNode.SelectSingleNode(m_baseNode + @"/table[1]/tr[1]/td[1]/table[1]/tr[1]/td[1]")); //More lines of similar form will go here for more info //this retrieves digikey PN foreach(HtmlNode node in m_unparsedProductInfoNodes) { m_unparsedProductInfo.Add(node.InnerText); } return m_unparsedProductInfo; } Although the path I'm using appears to be "correct" I keep getting NULL when I look at the list "m_unparsedProductInfoNodes" Any idea what's going on here? I'll also add that if I do a "SelectNodes" on the baseNode it only returns a div... not sure what that indicates but it doesn't seem right.

Read the article

C#: WebBrowser.Navigated Only Fires when I MessageBox.Show();

- by tsilb

I have a WebBrowser control which is being instantiated dynamically from a background STA thread because the parent thread is a BackgroundWorker and has lots of other things to do. The problem is that the Navigated event never fires, unless I pop a MessageBox.Show() in the method that told it to .Navigate(). I shall explain: ThreadStart ts = new ThreadStart(GetLandingPageContent_ChildThread); Thread t = new Thread(ts); t.SetApartmentState(ApartmentState.STA); t.Name = "Mailbox Processor"; t.Start(); protected void GetLandingPageContent_ChildThread() { WebBrowser wb = new WebBrowser(); wb.Navigated += new WebBrowserNavigatedEventHandler(wb_Navigated); wb.Navigate(_url); MessageBox.Show("W00t"); } protected void wb_Navigated(object sender, WebBrowserNavigatedEventArgs e) { WebBrowser wb = (WebBrowser)sender; // Breakpoint HtmlDocument hDoc = wb.Document; } This works fine; but the messagebox will get in the way since this is an automation app. When I remove the MessageBox.Show(), the WebBrowser.Navigated event never fires. I've tried supplanting this line with a Thread.Sleep(), and by suspending the parent thread. Once I get this out of the way, I intend to Suspend the parent thread while the WebBrowser is doing its job and find some way of passing the resulting HTML back to the parent thread so it can continue with further logic. Why does it do this? How can I fix it? If someone can provide me with a way to fetch the content of a web page, fill out some data, and return the content of the page on the other side of the submit button, all against a webserver that doesn't support POST verbs nor passing data via QueryString, I'll also accept that answer as this whole exercise will have been unneccessary.

Read the article

HtmlAgilityPack - Vs 2010 - c# ASP - File Not found

- by Janosch Geigowskoskilu

First, I've already searched the web & StackOverflow for hours, and i did find a lot about troubleshooting HtmlAgilityPack and tried most of these but nothing worked. The Situation: I'm developing a C# ASP .NET WebPart in SharePoint Foundation. Everything works fine, now I want to Parse a HTML Page to get all ImagePaths and save the Images on HD/Temp. To do that I was downloading HtmlAgilityPack, current version, add reference to Project, everything looks OK, IntelliSense works fine. The Exception: But when I want to run the section where HtmlAgilityPack should be used my Browser shows me a FileNotFoundException - The File or Assembly could not be found. What I tried: After first searches i tried to include v1.4.0 of HtmlAgilityPack cause I read that the current version in some case is not really stable. This works fine to until the point I want to use HtmlAgilityPack, the same Exception. I also tried moving the HtmlAgilityPack direct to the Solution directory, nothing changed. I tried to insert HtmlAgilityPack via using and I tried direct call e.g. HtmlAgilityPack.HtmlDocument. Conclusion : When I compile no error occurs, the reference is set correct. When I trace the HtmlAgilityPack.dll with ProcMon the Path is shown correct, but sometimes the Result is 'File Locked with only Readers' but I don't know enough about ProcMon to Know what this means or if this is critical. It couldn't have something to do with File Permissions because if I check the DLL the permissions are all given.

Search Results

Search found 58 results on 3 pages for 'htmldocument'.

Page 2/3 | < Previous Page | 1 2 3 | Next Page >

- by Matt

- by ajl

- by chobo2

- by chobo2

- by Fuxi

- by chobo2

- by Muhammad Akhtar

- by rushonerok

- by Waseem

- by Kaeso

- by Adam Haile

- by user70192

- by user345660

- by Anicho

- by Kim

- by Tom Dignan

- by Thinker

- by Martijn

- by Axeman

- by user1741587

- by takoi

- by Dave

- by Matthias Welsh

- by tsilb

- by Janosch Geigowskoskilu

< Previous Page | 1 2 3 | Next Page >