Search Results

Search found 45245 results on 1810 pages for 'html content extraction'.

Page 29/1810 | < Previous Page | 25 26 27 28 29 30 31 32 33 34 35 36  | Next Page >

  • 'Content' is NOT 'Text' in XAML

    - by psheriff
    One of the key concepts in XAML is that the Content property of a XAML control like a Button or ComboBoxItem does not have to contain just textual data. In fact, Content can be almost any other XAML that you want. To illustrate here is a simple example of how to spruce up your Button controls in Silverlight. Here is some very simple XAML that consists of two Button controls within a StackPanel on a Silverlight User Control. <StackPanel>  <Button Name="btnHome"          HorizontalAlignment="Left"          Content="Home" />  <Button Name="btnLog"          HorizontalAlignment="Left"          Content="Logs" /></StackPanel> The XAML listed above will produce a Silverlight control within a Browser that looks like Figure 1.   Figure 1: Normal button controls are quite boring. With just a little bit of refactoring to move the button attributes into Styles, we can make the buttons look a little better. I am a big believer in Styles, so I typically create a Resources section within my user control where I can factor out the common attribute settings for a particular set of controls. Here is a Resources section that I added to my Silverlight user control. <UserControl.Resources>  <Style TargetType="Button"         x:Key="NormalButton">    <Setter Property="HorizontalAlignment"            Value="Left" />    <Setter Property="MinWidth"            Value="50" />    <Setter Property="Margin"            Value="10" />  </Style></UserControl.Resources> Now back in the XAML within the Grid control I update the Button controls to use the Style attribute and have each button use the Static Resource called NormalButton. <StackPanel>  <Button Name="btnHome"          Style="{StaticResource NormalButton}"          Content="Home" />  <Button Name="btnLog"          Style="{StaticResource NormalButton}"          Content="Logs" /></StackPanel> With the additional attributes set in the Resources section on the Button, the above XAML will now display the two buttons as shown in Figure 2. Figure 2: Use Resources to Make Buttons More Consistent Now let’s re-design these buttons even more. Instead of using words for each button, let’s replace the Content property to use a picture. As they say… a picture is worth a thousand words, so let’s take advantage of that. Modify each of the Button controls and eliminate the Content attribute and instead, insert an <Image> control with the <Button> and the </Button> tags. Add a ToolTip to still display the words you had before in the Content and you will now have better looking buttons, as shown in Figure 3.   Figure 3: Using pictures instead of words can be an effective method of communication The XAML to produce Figure 3 is shown in the following listing: <StackPanel>  <Button Name="btnHome"          ToolTipService.ToolTip="Home"          Style="{StaticResource NormalButton}">    <Image Style="{StaticResource NormalImage}"            Source="Images/Home.jpg" />  </Button>  <Button Name="btnLog"          ToolTipService.ToolTip="Logs"          Style="{StaticResource NormalButton}">    <Image Style="{StaticResource NormalImage}"            Source="Images/Log.jpg" />  </Button></StackPanel> You will also need to add the following XAML to the User Control’s Resources section. <Style TargetType="Image"        x:Key="NormalImage">  <Setter Property="Width"          Value="50" /></Style> Add Multiple Controls to Content Now, since the Content can be whatever we want, you could also modify the Content of each button to be a StackPanel control. Then you can have an image and text within the button. <StackPanel>  <Button Name="btnHome"          ToolTipService.ToolTip="Home"          Style="{StaticResource NormalButton}">    <StackPanel>      <Image Style="{StaticResource NormalImage}"              Source="Images/Home.jpg" />      <TextBlock Text="Home"                  Style="{StaticResource NormalTextBlock}" />    </StackPanel>  </Button>  <Button Name="btnLog"          ToolTipService.ToolTip="Logs"          Style="{StaticResource NormalButton}">    <StackPanel>      <Image Style="{StaticResource NormalImage}"              Source="Images/Log.jpg" />      <TextBlock Text="Logs"                  Style="{StaticResource NormalTextBlock}" />    </StackPanel>  </Button></StackPanel> You will need to add one more resource for the TextBlock control too. <Style TargetType="TextBlock"        x:Key="NormalTextBlock">  <Setter Property="HorizontalAlignment"          Value="Center" /></Style> All of the above will now produce the following:   Figure 4: Add multiple controls to the content to make your buttons even more interesting. Summary While this is a simple example, you can see how XAML Content has great flexibility. You could add a MediaElement control as the content of a Button and play a video within the Button. Not that you would necessarily do this, but it does work. What is nice about adding different content within the Button control is you still get the Click event and other attributes of a button, but it does necessarily look like a normal button. Good Luck with your Coding,Paul Sheriff ** SPECIAL OFFER FOR MY BLOG READERS **Visit http://www.pdsa.com/Event/Blog for a free video on Silverlight entitled "Silverlight XAML for the Complete Novice - Part 1."

    Read the article

  • A Semantic Model For Html: TagBuilder and HtmlTags

    - by Ryan Ohs
    In this post I look into the code smell that is HTML literals and show how we can refactor these pesky strings into a friendlier and more maintainable model.   The Problem When I started writing MVC applications, I quickly realized that I built a lot of my HTML inside HtmlHelpers. As I did this, I ended up moving quite a bit of HTML into string literals inside my helper classes. As I wanted to add more attributes (such as classes) to my tags, I needed to keep adding overloads to my helpers. A good example of this end result is the default html helpers that come with the MVC framework. Too many overloads make me crazy! The problem with all these overloads is that they quickly muck up the API and nobody can remember exactly what order the parameters go in. I've seen many presenters (including members of the ASP.NET MVC team!) stumble before realizing that their view wasn't compiling because they needed one more null parameter in the call to Html.ActionLink(). What if instead of writing Html.ActionLink("Edit", "Edit", null, new { @class = "navigation" }) we could do Html.LinkToAction("Edit").Text("Edit").AddClass("navigation") ? Wouldn't that be much easier to remember and understand?  We can do this if we introduce a semantic model for building our HTML.   What is a Semantic Model? According to Martin Folwer, "a semantic model is an in-memory representation, usually an object model, of the same subject that the domain specific language describes." In our case, the model would be a set of classes that know how to render HTML. By using a semantic model we can free ourselves from dealing with strings and instead output the HTML (typically via ToString()) once we've added all the elements and attributes we desire to the model. There are two primary semantic models available in ASP.NET MVC: MVC 2.0's TagBuilder and FubuMVC's HtmlTags.   TagBuilder TagBuilder is the html builder that is available in ASP.NET MVC 2.0. I'm not a huge fan but it gets the job done -- for simple jobs.  Here's an overview of how to use TagBuilder. See my Tips section below for a few comments on that example. The disadvantage of TagBuilder is that unless you wrap it up with our own classes, you still have to write the actual tag name over and over in your code. eg. new TagBuilder("div") instead of new DivTag(). I also think it's method names are a little too long. Why not have AddClass() instead of AddCssClass() or Text() instead of SetInnerText()? What those methods are doing should be pretty obvious even in the short form. I also don't like that it wants to generate an id attribute from your input instead of letting you set it yourself using external conventions. (See GenerateId() and IdAttributeDotReplacement)). Obviously these come from Microsoft's default approach to MVC but may not be optimal for all programmers.   HtmlTags HtmlTags is in my opinion the much better option for generating html in ASP.NET MVC. It was actually written as a part of FubuMVC but is available as a stand alone library. HtmlTags provides a much cleaner syntax for writing HTML. There are classes for most of the major tags and it's trivial to create additional ones by inheriting from HtmlTag. There are also methods on each tag for the common attributes. For instance, FormTag has an Action() method. The SelectTag class allows you to set the default option or first option independent from adding other options. With TagBuilder there isn't even an abstraction for building selects! The project is open source and always improving. I'll hopefully find time to submit some of my own enhancements soon.   Tips 1) It's best not to have insanely overloaded html helpers. Use fluent builders. 2) In html helpers, return the TagBuilder/tag itself (not a string!) so that you can continue to add attributes outside the helper; see my first sample above. 3) Create a static entry point into your builders. I created a static Tags class that gives me access all the HtmlTag classes I need. This way I don't clutter my code with "new" keywords. eg. Tags.Div returns a new DivTag instance. 4) If you find yourself doing something a lot, create an extension method for it. I created a Nest() extension method that reads much more fluently than the AddChildren() method. It also accepts a params array of tags so I can very easily nest many children.   I hope you have found this post helpful. Join me in my war against HTML literals! I’ll have some more samples of how I use HtmlTags in future posts.

    Read the article

  • Configure WebCenter PS5 with WebCenter Content - Bad Example

    - by Vikram Kurma
    I opened JDeveloper, created a content repository connection with all the required fields. While testing the connection, the connection became successful. But while navigating to the Webcenter Content Connection, I am ( Always use past tense ) getting the following error. Notice the inconsistency in the image resolutions SEVERE: Could not list contents of folder with ID = dCollectionID:-1oracle.stellent.ridc.protocol.ServiceException: No service defined for COLLECTION_DISPLAY. To solve the issue, please find the following the steps in Webcenter Content. 1. Login into webcenter content : https://<hostname>:<port number>/cs 2. Click on 'Administration' and select 'Admin Server' 3. This will open a new window in browser, please select 'Component Manager'. 4. In the right side window, please click on 'Advanced component manager' 5. We can see all the enabled and disabled features. The main problem for this error is folders_g is not enabled and Framework folders might have enabled. But for creating a connection with webcenter portal framework or with webcenter spaces, we need folders_g, then only we will get Contribution folder. 6. come to the enabled feature session, select Framework folders and disable it. 7. Come to the disabled feature session, select folders_g in the list and enable it. 8. Restart the Webcenter content node. 9. Login into webcenter content system, go to 'Browse Content' menu. If we are able to see 'Contribution Folder' the problem is solved. ( Avoid dubious sentences ) We can configure webcenter content with Webcenter portal framework or with webcenter spaces.

    Read the article

  • How to remove duplicate content, which is still indexed, but not linked to anymore?

    - by David
    A bug in the tool, which we use to create search-engine-friendly URLs changed our whole URL-structure overnight, and we only noticed after Google already indexed the page. Now, we have a massive duplicate content issue, causing a harsh drop in rankings. Webmaster Tools shows over 1,000 duplicate title tags, so I don't think, Google understands what is going on. Right URL: abc.com/price/sharp-ah-l13-12000-btu.html Wrong URL: abc.com/item/sharp-l-series-ahl13-12000-btu.html (created by mistake) After that, we ... Changed back all URLs to the "Right URLs" Set up a 301-redirect for all "Wrong URLs" a few days later Now, still a massive amount of pages is in the index twice. As we do not link internally to the "Wrong URLs" anymore, I am not sure, if Google will re-crawl them very soon. What can we do to solve this issue and tell Google, that all the "Wrong URLs" now redirect to the "Right URLs"? Best, David

    Read the article

  • Portal And Content – Introduction (1 of 7)

    - by Stefan Krantz
    The coming post over the next two months will be included in a new series. The idea is to help the reader to understand how to enable a versatile and manageable portal. Each post will go through a specific use case or lifecycle group of events that a Content Driven Portal requires the development team to consider. The current planning is to deliver following subjects, each topic will be enclosed in a separate blog post. Introduction – Introduction to the series of posts and what to expect at the end of the series Components, part 1 – UCM, Site Studio and high level introduction to content templates Components, part 2 – Page Templates and  Navigation model Components, part 3 – Applied Customization Framework for Content Presenter Taskflows Scenario 1 – Enable a Portal for runtime administration Scenario 2 – Enable a Portal for Internationalization Scenario 3 – Enable a Portal for Content Workflows Background This post series has been issued to help customers, partners and consultants to understand the concept of a WebCenter Portal project where the main focus or a majority of the portal has content interaction. Today the most portal installations Oracle WebCenter Portal is involved in have a vast majority of content based pages. Many of the Portal projects have or will run into challenges, to mitigate these challenges the portal and content lifecycle has to be well designed. The coming posts will address the main components that should be involved when creating such scenarios; it will also go into details on the process by describing three solution scenarios. The aim with the scenarios is to give the reader a more hands on understanding of the concept of building and architecting a Content Driven Portal. The selected scenarios are selected based on the most common use cases that we have identified until today.

    Read the article

  • How much time it needs google webmaster yo generate content keyword if url masking is enabled? [closed]

    - by user1439968
    Possible Duplicate: What is domain “masking” or “cloaking”? Why should it be avoided for a new web site? my real domain is domain.in. But url masking has been enabled and the masked url is domain2.in .. In that case i have added d url bputdoubts.21backlogs.in to google webmaster a week ago but content keyword hasn't been generated. In this case when can I expect to get the content keywords generated ?? And is there a problem for getting visitors from google search if url masking is enabled ?

    Read the article

  • Is there a way to save MS Word document as HTML w/o the ms proprietary stuff?

    - by sequoia mcdowell
    So normally I wouldn't use this feature ("Save as Web Page") but I have large documents from clients they just want put on their site as HTML, and formatting it all by hand seems like a waste of time. I have tried "save as webpage" in Word 2007, but it produces all sorts of bad stuff. To wit: <b style='mso-bidi-font-weight:normal'> <span style="mso-spacerun: yes"> as well as a large block of XML formatting info: <!--[if gte mso 9]><xml> <o:DocumentProperties> <o:Subject> </o:Subject> <o:Author> </o:Author> <o:Keywords> </o:Keywords> ... As I said, formatting it all by hand seems like a waste of time, but the way MS exports currently just has too much cruft. Is there a way to export MS Word doc as html without all this?

    Read the article

  • I have domain.com and domain.org to the same site, should I use redirects to avoid duplicate content

    - by bunzip
    I have both the .com and the .org for a domain name, and using Apache I point them to the same site content. I think this might be causing problems with the Search Engines because of duplicate content. I want the .org to be the essential website. How do others handle this situation? Should I be using 301 redirects to point all the .com requests to the .org? Should I just use the link rel="canonical" on each page to point to the .org?

    Read the article

  • Is it legal to charge extra fees for copyrighted content on mobile platforms?

    - by Macrow Willson
    this question just came up as we recently bought content from image stock portals. Many of those altered their license agreement in favor of charging more for using in mobile apps. So instead of using their standard licenses, you need to pay an "extended" licenses which multiplies the fee easily by 5-10. That doesn't make sense as the mobile device is just a smaller browser and protects the content even better than a desktop computer. Are those stock agencies allowed to do that, and is it legal at all ? I am not a lawyer but I would even risk to go on with the standard license and wait to be sued in that matter.

    Read the article

  • Calling Html.ActionLink in a custom HTML helper

    - by Sylvain
    I am designing a custom HTML helper and I would like to execute Html.ActionLink to provide dynamic URL generation. namespace MagieMVC.Helpers { public static class HtmlHelperExtension { public static string LinkTable(this HtmlHelper helper, List<Method> items) { string result = String.Empty; foreach (Method m in items) { result += String.Format( "<label class=\"label2\">{0}</label>" + System.Web.Mvc.Html.ActionLink(...) + "<br />", m.Category.Name,m.ID, m.Name); } return result; } } } Unfortunately Html.ActionLink is not recognized in this context whatever the namespace I have tried to declare. As a generic question, I would like to know if it is possible to use any existing standard/custom Html helper method when designing a new custom helper. Thanks.

    Read the article

  • What is the difference (if any) between Html.Partial(view, model) and Html.RenderPartial(view,model)

    - by Stephane
    Other than the type it returns and the fact that you call it differently of course <% Html.RenderPartial(...); %> <%= Html.Partial(...) %> If they are different, why would you call one rather than the other one? The definitions: // Type: System.Web.Mvc.Html.RenderPartialExtensions // Assembly: System.Web.Mvc, Version=2.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35 // Assembly location: C:\Program Files (x86)\Microsoft ASP.NET\ASP.NET MVC 2\Assemblies\System.Web.Mvc.dll using System.Web.Mvc; namespace System.Web.Mvc.Html { public static class RenderPartialExtensions { public static void RenderPartial(this HtmlHelper htmlHelper, string partialViewName); public static void RenderPartial(this HtmlHelper htmlHelper, string partialViewName, ViewDataDictionary viewData); public static void RenderPartial(this HtmlHelper htmlHelper, string partialViewName, object model); public static void RenderPartial(this HtmlHelper htmlHelper, string partialViewName, object model, ViewDataDictionary viewData); } } // Type: System.Web.Mvc.Html.PartialExtensions // Assembly: System.Web.Mvc, Version=2.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35 // Assembly location: C:\Program Files (x86)\Microsoft ASP.NET\ASP.NET MVC 2\Assemblies\System.Web.Mvc.dll using System.Web.Mvc; namespace System.Web.Mvc.Html { public static class PartialExtensions { public static MvcHtmlString Partial(this HtmlHelper htmlHelper, string partialViewName); public static MvcHtmlString Partial(this HtmlHelper htmlHelper, string partialViewName, ViewDataDictionary viewData); public static MvcHtmlString Partial(this HtmlHelper htmlHelper, string partialViewName, object model); public static MvcHtmlString Partial(this HtmlHelper htmlHelper, string partialViewName, object model, ViewDataDictionary viewData); } }

    Read the article

  • Use Html.RadioButtonFor and Html.LabelFor for the same Model but different values

    - by Marc
    I have this Razor Template <table> <tr> <td>@Html.RadioButtonFor(i => i.Value, "1")</td> <td>@Html.LabelFor(i => i.Value, "true")</td> </tr> <tr> <td>@Html.RadioButtonFor(i => i.Value, "0")</td> <td>@Html.LabelFor(i => i.Value, "false")</td> </tr> </table> That gives me this HTML <table> <tr> <td><input id="Items_1__Value" name="Items[1].Value" type="radio" value="1" /></td> <td><label for="Items_1__Value">true</label></td> </tr> <tr> <td><input checked="checked" id="Items_1__Value" name="Items[1].Value" type="radio" value="0" /></td> <td><label for="Items_1__Value">false</label></td> </tr> </table> So I have the ID Items_1__Value twice which is - of course - not good and does not work in a browser when I click on the second label "false" the first radio will be activated. I know I could add an own Id at RadioButtonFor and refer to that with my label, but that's not pretty good, is it? Especially because I'm in a loop and cannot just use the name "value" with an added number, that would be end up in multiple Dom Ids in my final HTML markup as well. Shouldn't be a good solution for this?

    Read the article

  • Asynchronous Html.ImageGetter for setting multiple images in a TextView

    - by thedude19
    I'm writing an application that takes HTML pages and parses them to display on the screen. Specifically, this application pulls HTML from a message board and lists posts made by users. The problem is that a lot of the content in posts are pictures in <img> tags, so I need to write a Html.ImageGetter to handle the downloading of the images. My textView.setText() method will look like this: myTextView.setText(Html.fromHtml(myText, new ImageGetter() { @Override public Drawable getDrawable(String source) { Drawable d; // Need to async download image here return d; } }, null)); Doing this synchronously is trivial, but is there a suggested way to do this asynchronously so that it doesn't lock up my UI thread? I would also like to eventually build in caching of these images, but I imagine that would be pretty simple once the async downloading was there.

    Read the article

  • HTML to RTF Converter for .NET

    - by nickyt
    I've already seen lots of posts on the site for RTF to HTML and some other posts talking about some HTML to RTF converters, but I'm really trying to get a full breakdown of what is considered the most widely used commercial product, open source product or if people recommend going home grown. Apologies if you consider this a duplicate question, but I'm trying to create a product matrix to see what is the most viable for our application. I also think this would be helpful for others. The converter would be used in an ASP.NET 2.0 application (we're upgrading to 3.5 shortly but still sticking with WebForms) using SQLServer 2005 (soon 2008) as the DB. From reading a few posts, SautinSoft appears to be popular as a commercial component. Are there other commercial components that you'd recommend for converting HTML to RTF? Price does matter, but even if it's a little on the expensive side, please list it. For open source, I read that OpenOffice.org can be run as a service so that it can convert files. However, this appears to be only Java based. I imagine, I'd need some kind of interop to use this? What .NET open source components, if any, are out there for converting HTML to RTF? For home grown, is an XSLT the way to go with XHTML? If so, what component do you recommend for generating XHTML? Otherwise, what other home grown avenuses do you recommend. Also, please note that I currently don't care so much about RTF to HTML. If a commercial component offers this and the price is still the same, fine, otherwise please don't mention it.

    Read the article

  • how to display HTML in a UITextView

    - by Mark
    essetially I just want formatted HTML rendered in the UITextView, should I be using the undocumented setContentToHTMLString? I feel that I should not be using that, I have tried it, but the text (after being rendered as HTML) does not scroll properly, which is why I suspect its not documented... Should I just use a UIWebView? Can I just pass it arbitrary HTML and expect it to render it?

    Read the article

  • How to enter text in AJAX HTML Editor using watin

    - by Shaki
    Hi, I could not figure out how to enter text into HTML Editor using Watin. I tried //ie.TextField(Find.ById("htmlDetail_ctl06_ctl04")).TypeText("ABCD"); But got error: Can't move focus to the control because it is invisible, not enabled, or of a type that does not accept the focus. Can you give some example how to enter text into AJAX HTML Editor using watin please? I am not sure what to plug in frameSrc and java script from this solution - http://stackoverflow.com/questions/939448/unit-testing-the-ms-ajax-toolkit-html-editor Here is html from Develper tool when click the text box: Thanks in advance

    Read the article

  • Render html code in sql server client report (rdlc)

    - by masoud ramezani
    I am using the asp.net web application and microsoft visual studio reportviewer control and rdlc for creating a report ( not using sql server reporting). I used the Product table to view the result. It has five fields and I display all the itemsin the report. One field is Description and it store the html code as the value(eg: <div><ul><li>a</li><li>b</li></ul><b>aaaa</b></div> ). I want to disply the output of this html code in my report's description field. But in my report, it shows the html value that I stored in my table (: <div><ul><li>a</li><li>b</li></ul><b>aaaa</b></div> ). How can I render the html in my report. Please give me a solution.

    Read the article

  • How to set entire HTML in MSHTML?

    - by douglaslise
    How to set entire HTML in MSHTML? I am trying using this assignment: (Document as IHTMLDocument3).documentElement.innerHTML := 'abc'; but I got the error: "Target element invalid for this operation" I tried also using (Document as IHTMLDocument2).write but this form only adds html into the body section, and I neet to replace all the HTML source. Somebody has any idea how I do this? Thanks in advance.

    Read the article

  • c# Truncate HTML safely for article summary

    - by WickedW
    Hi All, Does anyone have a c# variation of this? This is so I can take some html and display it without breaking as a summary lead in to an article? http://stackoverflow.com/questions/1193500/php-truncate-html-ignoring-tags Save me from reinventing the wheel! Thank you very much ---------- edit ------------------ Sorry, new here, and your right, should have phrased the question better, heres a bit more info I wish to take a html string and truncate it to a set number of words (or even char length) so I can then show the start of it as a summary (which then leads to the main article). I wish to preserve the html so I can show the links etc in preview. The main issue I have to solve is the fact that we may well end up with unclosed html tags if we truncate in the middle of 1 or more tags! The idea I have for solution is to a) truncate the html to N words (words better but chars ok) first (be sure not to stop in the middle of a tag and truncate a require attribute) b) work through the opened html tags in this truncated string (maybe stick them on stack as I go?) c) then work through the closing tags and ensure they match the ones on stack as I pop them off? d) if any open tags left on stack after this, then write them to end of truncated string and html should be good to go!!!! -- edit 12112009 Here is what I have bumbled together so far as a unittest file in VS2008, this 'may' help someone in future My hack attempts based on Jan code are at top for char version + word version (DISCLAIMER: this is dirty rough code!! on my part) I assume working with 'well-formed' HTML in all cases (but not necessarily a full document with a root node as per XML version) Abels XML version is at bottom, but not yet got round to fully getting tests to run on this yet (plus need to understand the code) ... I will update when I get chance to refine having trouble with posting code? is there no upload facility on stack? Thanks for all comments :) using System; using System.Collections.Generic; using System.Text.RegularExpressions; using System.Xml; using System.Xml.XPath; using Microsoft.VisualStudio.TestTools.UnitTesting; namespace PINET40TestProject { [TestClass] public class UtilityUnitTest { public static string TruncateHTMLSafeishChar(string text, int charCount) { bool inTag = false; int cntr = 0; int cntrContent = 0; // loop through html, counting only viewable content foreach (Char c in text) { if (cntrContent == charCount) break; cntr++; if (c == '<') { inTag = true; continue; } if (c == '>') { inTag = false; continue; } if (!inTag) cntrContent++; } string substr = text.Substring(0, cntr); //search for nonclosed tags MatchCollection openedTags = new Regex("<[^/](.|\n)*?>").Matches(substr); MatchCollection closedTags = new Regex("<[/](.|\n)*?>").Matches(substr); // create stack Stack<string> opentagsStack = new Stack<string>(); Stack<string> closedtagsStack = new Stack<string>(); // to be honest, this seemed like a good idea then I got lost along the way // so logic is probably hanging by a thread!! foreach (Match tag in openedTags) { string openedtag = tag.Value.Substring(1, tag.Value.Length - 2); // strip any attributes, sure we can use regex for this! if (openedtag.IndexOf(" ") >= 0) { openedtag = openedtag.Substring(0, openedtag.IndexOf(" ")); } // ignore brs as self-closed if (openedtag.Trim() != "br") { opentagsStack.Push(openedtag); } } foreach (Match tag in closedTags) { string closedtag = tag.Value.Substring(2, tag.Value.Length - 3); closedtagsStack.Push(closedtag); } if (closedtagsStack.Count < opentagsStack.Count) { while (opentagsStack.Count > 0) { string tagstr = opentagsStack.Pop(); if (closedtagsStack.Count == 0 || tagstr != closedtagsStack.Peek()) { substr += "</" + tagstr + ">"; } else { closedtagsStack.Pop(); } } } return substr; } public static string TruncateHTMLSafeishWord(string text, int wordCount) { bool inTag = false; int cntr = 0; int cntrWords = 0; Char lastc = ' '; // loop through html, counting only viewable content foreach (Char c in text) { if (cntrWords == wordCount) break; cntr++; if (c == '<') { inTag = true; continue; } if (c == '>') { inTag = false; continue; } if (!inTag) { // do not count double spaces, and a space not in a tag counts as a word if (c == 32 && lastc != 32) cntrWords++; } } string substr = text.Substring(0, cntr) + " ..."; //search for nonclosed tags MatchCollection openedTags = new Regex("<[^/](.|\n)*?>").Matches(substr); MatchCollection closedTags = new Regex("<[/](.|\n)*?>").Matches(substr); // create stack Stack<string> opentagsStack = new Stack<string>(); Stack<string> closedtagsStack = new Stack<string>(); foreach (Match tag in openedTags) { string openedtag = tag.Value.Substring(1, tag.Value.Length - 2); // strip any attributes, sure we can use regex for this! if (openedtag.IndexOf(" ") >= 0) { openedtag = openedtag.Substring(0, openedtag.IndexOf(" ")); } // ignore brs as self-closed if (openedtag.Trim() != "br") { opentagsStack.Push(openedtag); } } foreach (Match tag in closedTags) { string closedtag = tag.Value.Substring(2, tag.Value.Length - 3); closedtagsStack.Push(closedtag); } if (closedtagsStack.Count < opentagsStack.Count) { while (opentagsStack.Count > 0) { string tagstr = opentagsStack.Pop(); if (closedtagsStack.Count == 0 || tagstr != closedtagsStack.Peek()) { substr += "</" + tagstr + ">"; } else { closedtagsStack.Pop(); } } } return substr; } public static string TruncateHTMLSafeishCharXML(string text, int charCount) { // your data, probably comes from somewhere, or as params to a methodint XmlDocument xml = new XmlDocument(); xml.LoadXml(text); // create a navigator, this is our primary tool XPathNavigator navigator = xml.CreateNavigator(); XPathNavigator breakPoint = null; // find the text node we need: while (navigator.MoveToFollowing(XPathNodeType.Text)) { string lastText = navigator.Value.Substring(0, Math.Min(charCount, navigator.Value.Length)); charCount -= navigator.Value.Length; if (charCount <= 0) { // truncate the last text. Here goes your "search word boundary" code: navigator.SetValue(lastText); breakPoint = navigator.Clone(); break; } } // first remove text nodes, because Microsoft unfortunately merges them without asking while (navigator.MoveToFollowing(XPathNodeType.Text)) { if (navigator.ComparePosition(breakPoint) == XmlNodeOrder.After) { navigator.DeleteSelf(); } } // moves to parent, then move the rest navigator.MoveTo(breakPoint); while (navigator.MoveToFollowing(XPathNodeType.Element)) { if (navigator.ComparePosition(breakPoint) == XmlNodeOrder.After) { navigator.DeleteSelf(); } } // moves to parent // then remove *all* empty nodes to clean up (not necessary): // TODO, add empty elements like <br />, <img /> as exclusion navigator.MoveToRoot(); while (navigator.MoveToFollowing(XPathNodeType.Element)) { while (!navigator.HasChildren && (navigator.Value ?? "").Trim() == "") { navigator.DeleteSelf(); } } // moves to parent navigator.MoveToRoot(); return navigator.InnerXml; } [TestMethod] public void TestTruncateHTMLSafeish() { // Case where we just make it to start of HREF (so effectively an empty link) // 'simple' nested none attributed tags Assert.AreEqual(@"<h1>1234</h1><b><i>56789</i>012</b>", TruncateHTMLSafeishChar( @"<h1>1234</h1><b><i>56789</i>012345</b>", 12)); // In middle of a! Assert.AreEqual(@"<h1>1234</h1><a href=""testurl""><b>567</b></a>", TruncateHTMLSafeishChar( @"<h1>1234</h1><a href=""testurl""><b>5678</b></a><i><strong>some italic nested in string</strong></i>", 7)); // more Assert.AreEqual(@"<div><b><i><strong>1</strong></i></b></div>", TruncateHTMLSafeishChar( @"<div><b><i><strong>12</strong></i></b></div>", 1)); // br Assert.AreEqual(@"<h1>1 3 5</h1><br />6", TruncateHTMLSafeishChar( @"<h1>1 3 5</h1><br />678<br />", 6)); } [TestMethod] public void TestTruncateHTMLSafeishWord() { // zero case Assert.AreEqual(@" ...", TruncateHTMLSafeishWord( @"", 5)); // 'simple' nested none attributed tags Assert.AreEqual(@"<h1>one two <br /></h1><b><i>three ...</i></b>", TruncateHTMLSafeishWord( @"<h1>one two <br /></h1><b><i>three </i>four</b>", 3), "we have added ' ...' to end of summary"); // In middle of a! Assert.AreEqual(@"<h1>one two three </h1><a href=""testurl""><b class=""mrclass"">four ...</b></a>", TruncateHTMLSafeishWord( @"<h1>one two three </h1><a href=""testurl""><b class=""mrclass"">four five </b></a><i><strong>some italic nested in string</strong></i>", 4)); // start of h1 Assert.AreEqual(@"<h1>one two three ...</h1>", TruncateHTMLSafeishWord( @"<h1>one two three </h1><a href=""testurl""><b>four five </b></a><i><strong>some italic nested in string</strong></i>", 3)); // more than words available Assert.AreEqual(@"<h1>one two three </h1><a href=""testurl""><b>four five </b></a><i><strong>some italic nested in string</strong></i> ...", TruncateHTMLSafeishWord( @"<h1>one two three </h1><a href=""testurl""><b>four five </b></a><i><strong>some italic nested in string</strong></i>", 99)); } [TestMethod] public void TestTruncateHTMLSafeishWordXML() { // zero case Assert.AreEqual(@" ...", TruncateHTMLSafeishWord( @"", 5)); // 'simple' nested none attributed tags string output = TruncateHTMLSafeishCharXML( @"<body><h1>one two </h1><b><i>three </i>four</b></body>", 13); Assert.AreEqual(@"<body>\r\n <h1>one two </h1>\r\n <b>\r\n <i>three</i>\r\n </b>\r\n</body>", output, "XML version, no ... yet and addeds '\r\n + spaces?' to format document"); // In middle of a! Assert.AreEqual(@"<h1>one two three </h1><a href=""testurl""><b class=""mrclass"">four ...</b></a>", TruncateHTMLSafeishCharXML( @"<body><h1>one two three </h1><a href=""testurl""><b class=""mrclass"">four five </b></a><i><strong>some italic nested in string</strong></i></body>", 4)); // start of h1 Assert.AreEqual(@"<h1>one two three ...</h1>", TruncateHTMLSafeishCharXML( @"<h1>one two three </h1><a href=""testurl""><b>four five </b></a><i><strong>some italic nested in string</strong></i>", 3)); // more than words available Assert.AreEqual(@"<h1>one two three </h1><a href=""testurl""><b>four five </b></a><i><strong>some italic nested in string</strong></i> ...", TruncateHTMLSafeishCharXML( @"<h1>one two three </h1><a href=""testurl""><b>four five </b></a><i><strong>some italic nested in string</strong></i>", 99)); } } }

    Read the article

  • HTML tidy/cleaning in Ruby 1.9

    - by Christian
    I'm currently using the RubyTidy Ruby bindings for HTML tidy to make sure HTML I receive is well-formed. Currently this library is the only thing holding me back from getting a Rails application on Ruby 1.9. Are there any alternative libraries out there that will tidy up chunks of HTML on Ruby 1.9?

    Read the article

  • Repairing malformatted html attributes using c#

    - by jhoefnagels
    I have a web application with an upload functionality for HTML files generated by chess software to be able to include a javascript player that reproduces a chess game. I do not like to load the uploaded files in a frame so I reconstruct the HTML and javascript generated by the software by parsing the dynamic parts of the file. The problem with the HTML is that all attributes values are surrounded with an apostrophe instead of a quotation mark. I am looking for a way to fix this using a library or a regex replace using c#. The html looks like this: <DIV class='pgb'><TABLE class='pgbb' CELLSPACING='0' CELLPADDING='0'><TR><TD> and I would transform it into: <DIV class="pgb"><TABLE class="pgbb" CELLSPACING="0" CELLPADDING="0"><TR><TD>

    Read the article

  • Why use a whitelist for HTML sanitizing?

    - by Carson Myers
    I've often wondered -- why use a whitelist as opposed to a blacklist when sanitizing HTML input? How many sneaky HTML tricks are there to open XSS vulnerabilities? Obviously script tags and frames are not allowed, and a whitelist would be used on the fields in HTML elements, but why disallow most of everything?

    Read the article

< Previous Page | 25 26 27 28 29 30 31 32 33 34 35 36  | Next Page >