regex for html - Page 5

/regexp?/ on HTML, but not in form

- by takeshin

I need to do some regex replacement on HTML input, but I need to exclude some parts from filtering by other regexp. (e.g. remove all <a> tags with specific href="example.com…, except the ones that are inside the <form> tag) Is there any smart regex technique for this? Or do I have to find all forms using $regex1, then split the input to the smaller chunks, excluding the matched text blocks, and then run the $regex2 on all the chunks?

Read the article

Replace HTML entities in a string avoiding <img> tags

- by Xeos

I have the following input: Hi! How are you? <script>//NOT EVIL!</script> Wassup? :P LOOOL!!! :D :D :D Which is then run through emoticon library and it become this: Hi! How are you? <script>//NOT EVIL!</script> Wassup? <img class="smiley" alt="" title="tongue, :P" src="ui/emoticons/15.gif"> LOOOL!!! <img class="smiley" alt="" title="big grin, :D" src="ui/emoticons/5.gif"> <img class="smiley" alt="" title="big grin, :P" src="ui/emoticons/5.gif"> <img class="smiley" alt="" title="big grin, :P" src="ui/emoticons/5.gif"> I have a function that escapes HTML entites to prevent XSS. So running it on raw input for the first line would produce: Hi! How are you? <script>//NOT EVIL!</script> Now I need to escape all the input, but at the same time I need to preserve emoticons in their initial state. So when there is <:-P emoticon, it stays like that and does not become <:-P. I was thinking of running a regex split on the emotified text. Then processing each part on its own and then concatenating the string together, but I am not sure how easily can Regex be bypassed? I know the format will always be this: [<img class="smiley" alt="] [empty string] [" title="] [one of the values from a big list] [, ] [another value from the list (may be matching original emoticon)] [" src="ui/emoticons/] [integer from Y to X] [.gif">] Using the list MAY be slow, since I need to run that regex on text that may have 20-30-40 emoticons. Plus there may be 5-10-15 text messages to process. What could be an elegant solution to this? I am ready to use third-party library or jQuery for this. PHP preprocessing is possible as well.

Read the article

RegEx to replace html entities

- by DeltaFox

Hi, all. I'm looking for a way to replace the bullet character in Greasemonkey. I assume a Regular Expression will do the trick, but I'm not as well-versed in it as many of you. For example, "SampleSite.com • Page Title" becoming "SampleSite.com Page Title". The issue is that the character has already been parsed by the time Greasemonkey has gotten to it, and I don't know how to make it recognize the symbol. I've tried these so far, but they haven't worked: newTitle = document.title.replace(/•/g, ""); newTitle = document.title.replace("•", ""); //just for grins, but didn't work anyway

Read the article

Regex to match all HTML tags except <p> and </p>

- by Xetius

I need to match and remove all tags using a regular expression in Perl. I have the following: <\\??(?!p).+?> But this still matches with the closing </p> tag. Any hint on how to match with the closing tag as well? Note, this is being performed on xhtml.

Read the article

RegEx expression or jQuery selector to NOT match "external" links in href

- by TrueBlueAussie

I have a jQuery plugin that overrides link behavior, to allow Ajax loading of page content. Simple enough with a delegated event like $(document).on('click','a', function(){});. but I only want it to apply to links that are not like these ones (Ajax loading is not applicable to them, so links like these need to behave normally): target="_blank" // New browser window href="#..." // Bookmark link (page is already loaded). href="afs://..." // AFS file access. href="cid://..." // Content identifiers for MIME body part. href="file://..." // Specifies the address of a file from the locally accessible drive. href="ftp://..." // Uses Internet File Transfer Protocol (FTP) to retrieve a file. href="http://..." // The most commonly used access method. href="https://..." // Provide some level of security of transmission href="mailto://..." // Opens an email program. href="mid://..." // The message identifier for email. href="news://..." // Usenet newsgroup. href="x-exec://..." // Executable program. href="http://AnythingNotHere.com" // External links Sample code: $(document).on('click', 'a:not([target="_blank"])', function(){ var $this = $(this); if ('some additional check of href'){ // Do ajax load and stop default behaviour return false; } // allow link to work normally }); Q: Is there a way to easily detect all "local links" that would only navigate within the current website? excluding all the variations mentioned above. Note: This is for an MVC 5 Razor website, so absolute site URLs are unlikely to occur.

Read the article

Regex for Date DD-MM-YYYY

- by Josh M

I have the following expression which will be used as date validation in the HTML5 "pattern" attribute. ?:0[1-9]|1[0-9]|2[0-9])|(?:(?!02)(?:0[1-9]|1[0-2])-(?:30))|(?:(?:0[13578]|1[02])-31))-(?:(?:0[1-9]|1[0-2])-(?:19|20)[0-9]{2} I want it to allow only valid dates, using "-" as a separator. This means up to 29th in February if it's a leap year, and 30/31 for other months respectively. Currently, it only allows years starting with 2 (2012) and months up to 12 (December). But it limits the day to 29 regardless of which month. Can anybody help me fix it?

Read the article

Remove all html tags and content except for a div class

- by Crazy

I want to remove all html content from a string except for a div class : <div class="toto">blablabla</div> Should I use a Regex or DOM Parser? To answer drachenstern : It's a comment content with bbcode. And the html in this div is generated with Geshi (code highlighter) so i don't want to delete this. For example a visitor can enter <script></script> in a [code][/code] bbcode tag. All HTML outside the [code][/code] bbcode tag must be delete no?

Read the article

Parsing HTML with XPath and PHP

- by Peter

Is there a way (using XPath and PHP) to do the following (WITHOUT external XSLT files)? Remove all tables and their contents Remove everything after the first h1 tag Keep only paragraphs (INCLUDING their inner HTML (links, lists, etc)) I received an XSLT answer here, but I'm looking for XPATH queries that don't require external files. Currently, I've got the HTML in question loaded into a SimpleXmlElement via: $doc = @DOMDocument::loadHTML($xml); $data = simplexml_import_dom($doc); Now I need help with: $data = $data->xpath('??????'); Been working with this one for several days to no avail. I really appreciate the help. Edit: I don't particularly care what's inside the paragraphs, as I can use strip_tags to eliminate what I don't want. All I need to do is to isolate the paragraphs from the rest of the source. I suppose a more specific, accurate requirement would be this: Return only paragraphs (and their html contents) that aren't contained in tables, and only before the first h1 tag

Read the article

Regular Expression to isolate an html tag

- by orit cohen

I'm looking for a regular expression to isolate an html tag. This includes the TAG the ATTRIBUTES and the CONTNET inside. Let's say I have this: <html> <body> aajsdfkjaskd <TAGNAME name="bla" context="non">hfdfhdj </TAGNAME> </body> </html> I need a regular expression that would return: <TAGNAME name="bla" context="non">hfdfhdj </TAGNAME> Thank, Joe

Read the article

When should JavaScript generate HTML?

- by VirtuosiMedia

I try to generate as little HTML from JavaScript as possible. Instead, I prefer to manipulate existing markup whenever I can and only generate HTML when I need to dynamically insert an element that isn't a good candidate for using Ajax. This, I believe, makes it far easier to maintain the code and quickly make changes to it because the markup is easier to read and trace. My rule of thumb is: HTML is for document structure, CSS is for presentation, JavaScript is for behavior. However, I've seen a lot of JS code that generates mounds of HTML, including entire forms and content-heavy modal dialogs. In general, which method is considered best practice? In what circumstances should JavaScript be used to generate HTML and when should it not?

Read the article

Regex to match . (periods marking end of sentences) but not Mr. (as in Mr. Hopkins)

- by Josh Crews

I'm trying to parse a text file into sentences ending in periods, but names like Mr. Hopkins are throwing false alarms on matching for periods. What regex indentfy's "." but not "Mr." For bonus, I'm also using ! to find end of sentences, so my current Regex is /(!/./ and I'd love an answer that incorporates my !'s too.

Read the article

Convert tags to html entities

- by Sarfraz

Hello, Is it possible to convert html tags to html entities using javascript/jquery using any way possible such as regex or some other way. If yes, then how? Example: <div> should be converted to <div>

Read the article

Split string by HTML entities?

- by user366124

My string contain a lot of HTML entities, like this "Hello <everybody> there" And I want to split it by HTML entities into this : Hello everybody there Can anybody suggest me a way to do this please? May be using Regex?

Read the article

[Javascript] Split string by HTML entities?

- by user366124

Hello, My string contain a lot of HTML entities, like this "Hello <everybody>" there And I want to split it by HTML entities into this : Hello everybody there Can anybody suggest me a way to do this please? May be using Regex? Thanks so much.

Read the article

simscan's regex

- by alexus

-bash-3.2# cat /var/qmail/control/simcontrol :clam=yes,spam=yes,spam_hits=7.0,regex=^Subject\072.*(7.|8.)\%.*:(?m)\.ru\/\n{21} -bash-3.2# cat ./cur/msg.1268526916.764928.8759:2,S | pcregrep -M '(?m)\.ru\/\n{21}' Party's over for Clinton http://260.noonwife.ru/ of because Abraham is large Confessional murdered the for -bash-3.2# grep -c REGEX /var/log/qmail/smtpd/@* /var/log/qmail/smtpd/@400000004b9c134f0095ecdc.s:25 /var/log/qmail/smtpd/@400000004b9c144c2748a9dc.s:6 /var/log/qmail/smtpd/@400000004b9c16eb2ac491fc.s:12 /var/log/qmail/smtpd/@400000004b9c1c61239185ac.s:28 /var/log/qmail/smtpd/@400000004b9c216a3013fdb4.s:29 /var/log/qmail/smtpd/@400000004b9c26b11fb5263c.s:22 /var/log/qmail/smtpd/@400000004b9c2b2505d2035c.s:25 /var/log/qmail/smtpd/@400000004b9c2ec3139530f4.s:12 /var/log/qmail/smtpd/@400000004b9c312c160d7454.s:4 -bash-3.2# first regex works, yet i can't get it to match second, even though pcregrep matches it using same regex just fine any ideas?

Read the article

Java library for HTML analysis

- by Raj

Hi, (I've seen similar questions, but I think none of them cater to my specific needs, hence...) I would like to know if there is a Java library for analysis of real-world (read: incomplete, ill-formed) HTML. By analysis, I mean things like: figuring out the most prominent color in an HTML chunk changing that color to some other color (hence, has to support modification of the HTML as well) pruning out unwanted tags fixing up the HTML to result in a well formed HTML snippet Parts of the last two are done by libraries such as Jericho, and jTidy. 'Plugins' on top of these would be great. Thanks in advance!

Read the article

Extracting email addresses in an html block in ruby/rails

- by corroded

I am creating a parser that wards off against spamming and harvesting of emails from a block of text that comes from tinyMCE (so it may or may not have html tags in it) I've tried regexes and so far this has been successful: /\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b/i problem is, i need to ignore all email addresses with mailto hrefs. for example: <a href="mailto:[email protected]">[email protected]</a> should only return the second email add. To get a background of what im doing, im reversing the email addresses in a block so the above example would look like this: <a href="mailto:[email protected]">moc.liam@tset</a> problem with my current regex is that it also replaces the one in href. Is there a way for me to do this with a single regex? Or do i have to check for one then the other? Is there a way for me to do this just by using gsub or do I have to use some nokogiri/hpricot magicks and whatnot to parse the mailtos? Thanks in advance! Here were my references btw: so.com/questions/504860/extract-email-addresses-from-a-block-of-text so.com/questions/1376149/regexp-for-extracting-a-mailto-address im also testing using this: http://rubular.com/ edit here's my current helper code: def email_obfuscator(text) text.gsub(/\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b/i) { |m| m = "<span class='anti-spam'>#{m.reverse}</span>" } end which results in this: <a target="_self" href="mailto:<span class='anti-spam'>moc.liamg@tset</span>"><span class="anti-spam">moc.liamg@tset</span></a>

Read the article

Pure JSP without mixing HTML, by writing html as Java-like code

- by ADTC

Please read before answering. This is a fantasy programming technique I'm dreaming up. I want to know if there's anything close in real life. The following JSP page: <% html { head { title {"Pure fantasy";} } body { h1 {"A heading with double quote (\") character";} p {"a paragraph";} String s = "a paragraph in string. the date is "; p { s; new Date().toString(); } table (Border.ZERO, new Padding(27)) { tr { for (int i = 0; i < 10; i++) { td {i;} } } } } } %> could generate the following HTML page: <html> <head> <title>Pure fantasy</title> </head> <body> <h1>A heading with double quote (") character</h1> <p>a paragraph</p> <p>a paragraph in string. the date is 11 December 2012</p> <table border="0" padding="27"> <tr> <td>0</td> <td>1</td> <td>2</td> <td>3</td> <td>4</td> <td>5</td> <td>6</td> <td>7</td> <td>8</td> <td>9</td> </tr> </table> </body> </html> The thing about this fantasy is it reuses the same old Java programming language technique that enable customized keywords used in a way similar to if-else-then, while, try-catch etc to represent html tags in a non-html way that can easily be checked for syntactic correctness, and most importantly can easily be mixed up with regular Java code without being lost in a sea of <%, %>, <%=, out.write(), etc. An added feature is that strings can directly be placed as commands to print out into generated HTML, something Java doesn't support (where pure strings have to be assigned to variables before use). Is there anything in real life that comes close? If not, is it possible to define customized keywords in Java or JSP? Or do I have to create an entirely new programming language for that? What problems do you see with this kind of setup? PS: I know you can use HTML libraries to construct HTML using Java code, but the problem with such libraries is, the source code itself doesn't have a readable HTML representation like the code above does - if you get what I mean.

Read the article

Sending HTML to Gmail always lands in Spam

- by cartaysm

I am having an issue with sending HTML emails to Gmail. I can send them to Yahoo, Hotmail, RR, AOL, etc. with no problem at all, but when I send them to Gmail I get kicked to spam. I have checked my IP with a lot of different list to make sure it is not listed anywhere, which it is not. spamhaus = is not listed in the DBL abuse.net = is not listed in the SBL abuse.net = is not listed in the PBL abuse.net = is not listed in the XBL spamcop = not listed in bl.spamcop.net host 24.172.204.xxx xxx.204.172.24.in-addr.arpa domain name pointer xxxevents.com. host xxxevents.com xxxevents.com has address 24.172.204.xxx xxxevents.com mail is handled by 10 mail.xxxevents.com. I am just trying to send a very VERY basic HTML message (listed below). I use an Ubuntu server, swiftmailer, multipart/alternative (HTML & plain), SPF = pass, and I am going to setup DKIM today to see if that fixes it (but I doubt it will)... For now I will only post the message I sent that gets kicked to spam and can provide any details needed. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head><title>Triathlon</title></head> <body> <table cellpadding="0" cellspacing="0"> <tr> <td> <p>Thank you for attending our 4th annual Triathlon/Duathlon/5k at Hueston Woods State Park on August 12th. This event is held annually to raise research funding for Crohn's Disease, Ulcerative Colitis, and Muscular Dystrophy diseases.</p> </td> </tr> <tr> <td> <p>As you know the results and pictures have been posted on our home page at since Sunday 8/13/2012. Now we also have updated our Facebook page with those photos and you can start tagging yourself or downloading the pictures now! <br /> our page and tag yourself at </p> <p> test test </p> <p>Race day events is professionally managed by Speedy-Feet</p> </td> </tr> </table> </body> </html> Just plain text works great, I thought maybe wording was messing me up but not the case... I am almost done install opendkim so I will be able to rule that out very soon. Edit: Okay installed opendkim and I am getting passing results so I sent the html I posted above it went through just fine. So now when I start to add a few more lines I am getting kicked back to spam again. Here is updated html code: ` <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head><title>Triathlon</title></head> <body> <table cellpadding="0" cellspacing="0"> <tr> <td> <center><a href='http://xxxevents.com' target="_blank"> <font face="Verdana, Arial, Helvetica, sans-serif" color="#666666" size="2"> <img src="http://xxxevents.com/marketemailimages/xxxlogo.png" alt="xxx It Events | Raising funds for Crohns, Colitis, and Muscular Dystrophy" border="0" /> </font></a></center> </td> <tr> <td> <p>Thank you for attending our 4th annual Triathlon/Duathlon/5k at Hueston Woods State Park on August 12th. This event is held annually to raise research funding for Crohn's Disease, Ulcerative Colitis, and Muscular Dystrophy diseases.</p> </td> </tr> <tr> <td> <p>As you know the results and pictures have been posted on our home page at since Sunday 8/13/2012. Now we also have updated our Facebook page with those photos and you can start tagging yourself or downloading the pictures now! <br /> our page and tag yourself at </p> <p> test test </p> <p>Race day events is professionally managed by Speedy-Feet</p> </td> </tr> </table> <table width="100%" border="0" cellspacing="0" cellpadding="0"> <tr> <td valign="top"> <div align="center" style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:10px;"><br />PO Box xxx Maineville, OH 45039<br /> <a href="mailto:[email protected]">[email protected]</a> | <a href='http://xxxevents.com' target="_blank">xxxevents.com</a><br /> <br /> </div> </td> </tr> </table> </body> </html>`

Read the article

Html Agility Pack for Reading “Real World” HTML

- by WeigeltRo

In an ideal world, all data you need from the web would be available via well-designed services. In the real world you sometimes have to scrape the data off a web page. Ugly, dirty – but if you really want that data, you have no choice. Just don’t write (yet another) HTML parser. I stumbled across the Html Agility Pack (HAP) a long time ago, but just now had the need for a robust way to read HTML. A quote from the website: This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams). Using the HAP was a simple matter of getting the Nuget package, taking a look at the example and dusting off some of my XPath knowledge from years ago. The documentation on the Codeplex site is non-existing, but if you’ve queried a DOM or used XPath or XSLT before you shouldn’t have problems finding your way around using Intellisense (ReSharper tip: Press Ctrl+Shift+F1 on class members for reading the full doc comments).

Read the article

Efficiently Combine MatchCollections in .Net Regex

- by Laramie

In the simplified example, there are 2 Regular Expressions, one case sensitive, the other not. The idea would be to efficiently create an IEnumerable collection (see "combined" below) combining the results. string test = "abcABC"; string regex = "(?<grpa>a)|(?<grpb>b)|(?<grpc>c)]"; Regex regNoCase = new Regex(regex, RegexOptions.IgnoreCase); Regex regCase = new Regex(regex); MatchCollection matchNoCase = regNoCase.Matches(test); MatchCollection matchCase = regCase.Matches(test); //Combine matchNoCase and matchCase into an IEnumerable IEnumerable<Match> combined= null; foreach (Match match in combined) { //Use the Index and (successful) Groups properties //of the match in another operation } In practice, the MatchCollections might contain thousands of results and be run frequently using long dynamically created REGEXes, so I'd like to shy away from copying the results to arrays, etc. I am still learning LINQ and am fuzzy on how to go about combining these or what the performance hits to an already sluggish process will be.

Read the article

php regex expression to get title

- by 55skidoo

I'm trying to strip content titles out of the middle of text strings. Could I use regex to strip everything out of this string except for the title (in italics) in these strings? Or is there a better way? Joe User wrote a blog post called The 10 Best Regex Expressions in the category Regex. Jane User wrote a blog post called Regex is Hard! in the category TechProblems. I've tried to come up with a regex expression to cover this, but I think it might need two. The trick is that the text in bold is always the same, so you could search for that, like this: regex: delete everything before and including wrote a blog post called regex: delete in the category and everything after it.

Read the article

Scala regex Named Capturing Groups

- by Brent

In scala.util.matching.Regex trait MatchData I see that there support for groupnames (Named Capturing Groups) But since Java does not support groupnames until version 7 as I understand it, Scala version 2.8.0.RC4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6. gives me this exception: scala> val pattern = """(?<login>\w+) (?<id>\d+)""".r java.util.regex.PatternSyntaxException: Look-behind group does not have an obvio us maximum length near index 11 (?<login>\w+) (?<id>\d+) ^ at java.util.regex.Pattern.error(Pattern.java:1713) at java.util.regex.Pattern.group0(Pattern.java:2488) at java.util.regex.Pattern.sequence(Pattern.java:1806) at java.util.regex.Pattern.expr(Pattern.java:1752) at java.util.regex.Pattern.compile(Pattern.java:1460) So the question is Named Capturing Groups supported in Scala? If so any examples out there? If not I might look into the Named-Regexp lib from clement.denis.

Read the article

Is regex too slow? Real life examples where simple non-regex alternative is better

- by polygenelubricants

I've seen people here made comments like "regex is too slow!", or "why would you do something so simple using regex!" (and then present a 10+ lines alternative instead), etc. I haven't really used regex in industrial setting, so I'm curious if there are applications where regex is demonstratably just too slow, AND where a simple non-regex alternative exists that performs significantly (maybe even asymptotically!) better. Obviously many highly-specialized string manipulations with sophisticated string algorithms will outperform regex easily, but I'm talking about cases where a simple solution exists and significantly outperforms regex. What counts as simple is subjective, of course, but I think a reasonable standard is that if it uses only String, StringBuilder, etc, then it's probably simple.

Read the article

Can the csv format be defined by a regex?

- by Spencer Rathbun

A colleague and I have recently argued over whether a pure regex is capable of fully encapsulating the csv format, such that it is capable of parsing all files with any given escape char, quote char, and separator char. The regex need not be capable of changing these chars after creation, but it must not fail on any other edge case. I have argued that this is impossible for just a tokenizer. The only regex that might be able to do this is a very complex PCRE style that moves beyond just tokenizing. I am looking for something along the lines of: ... the csv format is a context free grammar and as such, it is impossible to parse with regex alone ... Or am I wrong? Is it possible to parse csv with just a POSIX regex? For example, if both the escape char and the quote char are ", then these two lines are valid csv: """this is a test.""","" "and he said,""What will be, will be."", to which I replied, ""Surely not!""","moving on to the next field here..."

Search Results

Search found 32731 results on 1310 pages for 'regex for html'.

Page 5/1310 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

- by takeshin

- by Xeos

- by DeltaFox

- by Xetius

- by TrueBlueAussie

- by Josh M

- by Crazy

- by Peter

- by orit cohen

- by VirtuosiMedia

- by Josh Crews

- by Sarfraz

- by user366124

- by user366124

- by alexus

- by Raj

- by corroded

- by ADTC

- by cartaysm

- by WeigeltRo

- by Laramie

- by 55skidoo

- by Brent

- by polygenelubricants

- by Spencer Rathbun

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >