Search Results

Search found 9016 results on 361 pages for 'regex libraries'.

Page 120/361 | < Previous Page | 116 117 118 119 120 121 122 123 124 125 126 127  | Next Page >

  • Sanitize Content: removing markup from Amazon's content

    - by StackOverflowNewbie
    I'm using Amazon Web Service to get product descriptions of various items. The problem is that Amazon's content contains mark up that is sometimes destructive to the layout of my web page (e.g. unclosed DIVs, etc.). I want to sanitize the content I get from Amazon. My solution would be to do the following (my initial list so far): Remove unnecessary tags such as div, span, etc. while keeping tags like p, ul, ol, etc. Remove all attributes from all the tags (e.g. seems like there are style attributes in some of the tags) Remove excess white space (e.g. multiple spaces, carriage returns, new lines, tabs, etc.) Etc. Before I go off trying to build my solution, I'm wondering if anyone has a better idea (or an already existing solution). Thanks.

    Read the article

  • Perl: Edit hyperlinks in nested tags that aren't on separate lines

    - by user305801
    I have an interesting problem. I wrote the following perl script to recursively loop through a directory and in all html files for img/script/a tags do the following: Convert the entire url to lowercase Replace spaces and %20 with underscores The script works great except when an image tag in wrapped with an anchor tag. Is there a way to modify the current script to also be able to manipulate the links for nested tags that are not on separate lines? Basically if I have <a href="..."><img src="..."></a> the script will only change the link in the anchor tag but skip the img tag. #!/usr/bin/perl use File::Find; $input="/var/www/tecnew/"; sub process { if (-T and m/.+\.(htm|html)/i) { #print "htm/html: $_\n"; open(FILE,"+<$_") or die "couldn't open file $!\n"; $out = ''; while(<FILE>) { $cur_line = $_; if($cur_line =~ m/<a.*>/i) { print "cur_line (unaltered) $cur_line\n"; $cur_line =~ /(^.* href=\")(.+?)(\".*$)/i; $beg = $1; $link = html_clean($2); $end = $3; $cur_line = $beg.$link.$end; print "cur_line (altered) $cur_line\n"; } if($cur_line =~ m/(<img.*>|<script.*>)/i) { print "cur_line (unaltered) $cur_line\n"; $cur_line =~ /(^.* src=\")(.+?)(\".*$)/i; $beg = $1; $link = html_clean($2); $end = $3; $cur_line = $beg.$link.$end; print "cur_line (altered) $cur_line\n"; } $out .= $cur_line; } seek(FILE, 0, 0) or die "can't seek to start of file: $!"; print FILE $out or die "can't print to file: $1"; truncate(FILE, tell(FILE)) or die "can't truncate file: $!"; close(FILE) or die "can't close file: $!"; } } find(\&process, $input); sub html_clean { my($input_string) = @_; $input_string = lc($input_string); $input_string =~ s/%20|\s/_/g; return $input_string; }

    Read the article

  • Regular Expression to break row with comma separated values into distinct rows

    - by Nick
    I have a file with many rows. Each row has a column which may contain comma separated values. I need each row to be distinct (ie no comma separated values). Here is an example row: AB AB10,AB11,AB12,AB15,AB16,AB21,AB22,AB23,AB24,AB25,AB99 ABERDEEN Aberdeenshire The columns are comma separated (Postcode area, Postcode districts, Post town, Former postal county). So the above row would get turned into: AB AB10 ABERDEEN Aberdeenshire AB AB11 ABERDEEN Aberdeenshire AB AB12 ABERDEEN Aberdeenshire ... ... I tried the following but it didn't work... (.+)\t(([0-9A-Z]+),)+\t(.+)\t(.+)

    Read the article

  • PHP regular expression

    - by Ferol
    such text: $text = ' href="http://yahoo.com" target="_blank"> link text </a> text... text... <br> text...'; // $text = ' text... <a href="http://yahoo.com" target="_blank"> link text </a> text... text... <br> text...'; and such regular expression: preg_match_all('/^(.*)(<.+>)(.*)(<\/.+>)(.*)$/',$text,$matches); what I want, - to check if text matches the regular expression. If yes, then $matches should contain parts of string above, if not (as I guess) it should contain four zero-length arrays. something is wrong, but I can't find, what actually is?

    Read the article

  • How can I use Perl's s/// in an expression?

    - by mikeY
    I got a headache looking for this: How do you use s/// in an expression as opposed to an assignment. To clarify what I mean, I'm looking for a perl equivalent of python's re.sub(...) when used in the following context: newstring = re.sub('ab', 'cd', oldstring) The only way I know how to do this in perl so far is: $oldstring =~ s/ab/cd/; $newstring = $oldstring; Note the extra assignment.

    Read the article

  • Javascript substrings multiline replace by RegExp

    - by Radek Šimko
    Hi, I'm having some troubles with matching a regular expression in multi-line string. <script> var str="Welcome to Google!\n"; str = str + "We are proud to announce that Microsoft has \n"; str = str + "one of the worst Web Developers sites in the world."; document.write(str.replace(/.*(microsoft).*/gmi, "$1")); </script> http://jsbin.com/osoli3/3/edit As you may see on the link above, the output of the code looks like this: Welcome to Google! Microsoft one of the worst Web Developers sites in the world. Which means, that the replace() method goes line by line and if there's no match in that line, it returns just the whole line... Even if it has the "m" (multiline) modifier...

    Read the article

  • Regular Expression for finding phone numbers

    - by Rocky
    Hello Everyone, I am new to Stackoverflow and I have a quick question. Let's assume we are given a large number of HTML files (large as in theoretically infinite). How can I use Regular Expressions to extract the list of Phone Numbers from all those files? Explanation/expression will be really appreciated. The Phone numbers can be any of the following formats: (123) 456 7899 (123).456.7899 (123)-456-7899 123-456-7899 123 456 7899 1234567899 Thanks a lot for all your help and have a good one!

    Read the article

  • How can I replace a plus sign in JavaScript?

    - by William Calleja
    I need to make a replace of a plus sign in a javascript string. there might be multiple occurrence of the plus sign so I did this up until now: myString= myString.replace(/+/g, "");# This is however breaking up my javascript and causing glitches. How do you escape a '+' sign in a regular expression?

    Read the article

  • Regular expression to remove all text except...

    - by Barryman9000
    There may be an easier way, and if there is I'm all for it. However - my ASP.NET page has a TON of controls on it, and I've given them all ID's that start with underscore. I copied all the markup into Notepad++ and I'm trying to find a regular expression that will find everything but the controls and replace it with whitespace. that way I'll have a text file that has all my control names which I'll probably throw into Excel and do some string manipulation to add ".Text = " etc. Any suggestions?

    Read the article

  • Parsing two-dimensional text

    - by alexbw
    I need to parse text files where relevant information is often spread across multiple lines in a nonlinear way. An example: 1234 1 IN THE SUPERIOR COURT OF THE STATE OF SOME STATE 2 IN AND FOR THE COUNTY OF SOME COUNTY 3 UNLIMITED JURISDICTION 4 --o0o-- 5 6 JOHN SMITH and JILL SMITH, ) ) 7 Plaintiffs, ) ) 8 vs. ) No. 12345 ) 9 ACME CO, et al., ) ) 10 Defendants. ) ___________________________________) I need to pull out Plaintiff and Defendant identities. These transcripts have a very wide variety of formattings, so I can't always count on those nice parentheses being there, or the plaintiff and defendant information being neatly boxed off, e.g.: 1 SUPREME COURT OF THE STATE OF SOME OTHER STATE COUNTY OF COUNTYVILLE 2 First Judicial District Important Litigation 3 --------------------------------------------------X THIS DOCUMENT APPLIES TO: 4 JOHN SMITH, 5 Plaintiff, Index No. 2000-123 6 DEPOSITION 7 - against - UNDER ORAL EXAMINATION 8 OF JOHN SMITH, 9 Volume I 10 ACME CO, et al, 11 Defendants. 12 --------------------------------------------------X The two constants are: "Plaintiff" will occur after the name of the plaintiff(s), but not necessarily on the same line. Plaintiffs and defendants' names will be in upper case. Any ideas?

    Read the article

  • Capturing the contents of <select>

    - by joey mueller
    I'm trying to use a regular expression to capture the contents of all option values inside an HTML select element For example, in: <select name="test"> <option value="blah">one</option> <option value="mehh">two</option> <option value="rawr">three</option> </select> I'd like to capture one two and three into an array. My current code is var pages = responseDetails.responseText.match(/<select name="page" .+?>(?:\s*<option .+?>([^<]+)<\/option>)+\s*<\/select>/); for (var c = 0; c<pages.length; c++) { alert(pages[c]); } But it only captures the last value, in this case, "three". How can I modify this to capture all of them? Thanks!

    Read the article

  • Fastcgi 500 error on preg_match_all in PHP

    - by Bertvan
    Hi, I'm trying to set up some exotic PHP code (I'm no expert), and I get a FastCGI Error 500 on a PHP line containing 'preg_match_all'. When I comment out the line, the page is returned with a 200 (but not how it was meant to be). The code is parsing php, html and javascript content loaded from the database and is composing them to return the finished page. Now, by placing around some error_log entries I could determine that the line with the preg_match_all is the cause of the 500. However the line is hit multiple times during the loading of the page and on other occasions, the line does not cause an error. Here's how it looks like exactly: preg_match_all ("/(<([\w]+)[^>]*>)((?:.|\n)*)(<\/\\2>)/", $part['data'], $tags, PREG_PATTERN_ORDER|PREG_OFFSET_CAPTURE); The subject string is a piece of text that looks like: <script> ... some javascript functions ... </script> [Edit:] This is code that is up and running correctly elsewhere, so this very well could be a PHP setting or environment difference. I'm using PHP 5.2.13 on IIS6 with FastCGI. [Edit:] Nothing is mentioned in the log files. At least not in the ones I checked: IIS Logs Event Logs PHP Log Any thoughts or direction would be welcome.

    Read the article

  • Regular Expression to select Hyperlink

    - by Veejay
    I am using the following Expression to select all hyperlinks //a[@href] How can I write an expression to select all hyperlinks which match this format http://abc.com/articles/1 here http://abc.com/articles/ is constant and the article number increases

    Read the article

  • Python: using a regular expression to match one line of HTML

    - by skylarking
    This simple Python method I put together just checks to see if Tomcat is running on one of our servers. import urllib2 import re import sys def tomcat_check(): tomcat_status = urllib2.urlopen('http://10.1.1.20:7880') results = tomcat_status.read() pattern = re.compile('<body>Tomcat is running...</body>',re.M|re.DOTALL) q = pattern.search(results) if q == []: notify_us() else: print ("Tomcat appears to be running") sys.exit() If this line is not found : <body>Tomcat is running...</body> It calls : notify_us() Which uses SMTP to send an email message to myself and another admin that Tomcat is no longer runnning on the server... I have not used the re module in Python before...so I am assuming there is a better way to do this... I am also open to a more graceful solution with Beautiful Soup ... but haven't used that either.. Just trying to keep this as simple as possible...

    Read the article

  • Regexp for handling recursive arguments

    - by Matt
    Hi all, I'm a regexp novice, so I'm wondering what the regexp for the following: function {function arg1, arg2}, arg3 I'm looking to be able to just select the top-level arguments: {function arg1, arg2} & arg3 Ideally the response would be using preg_match in PHP, but almost any regexp would work fine. Thanks! Matt

    Read the article

  • bash grep finding java declarations

    - by Amarsh
    i have a huge .java file and i want to find all declared objects given the className. i think the declaration will always have the following signature: className objName; or className objName = or className objName= can someone suggest me a grep pattern which will find these signatures. I have the following (incomplete) : cat $rootFile | grep "$className "

    Read the article

  • Issue with my regular expression?

    - by Rubans
    I'm trying to locate the number matches in a relative path for directory up references("..\"). So I have the following pattern : "(..\)" which works as expected for the path "....\a\b" where it will give me 2 successfull groups ("..\") but when I try the path "..\a\b" it will also return 2 when it should be 1. I tried this in a reg ex tool such Expresso and it seems to work as expected in there but not in in .net, any ideas?

    Read the article

  • Regular expression only for website

    - by Katie
    HI, I'm new to Regular Expression. I need to find just website in some text and I'm looking for a regular expression able to find out strings like: www.my.home, http://my.site.it But this regular expression should not find strings like: [email protected] or if the website is already inside html tag <a href="http://www.my.site.com/"><span style="font-style: normal;">www.mambo-test.org</span></a> I tried with this one: \b((https?://[^ ])|(www.[^ ])) but it also finds the website in the href and between the tag: <a href="http://www.my.site.com/"><span style="font-style: normal;">www.mambo-test.org</span></a> and I don't know how except this case.

    Read the article

< Previous Page | 116 117 118 119 120 121 122 123 124 125 126 127  | Next Page >