Search Results

Search found 5919 results on 237 pages for 'regex matching'.

Page 107/237 | < Previous Page | 103 104 105 106 107 108 109 110 111 112 113 114 | Next Page >

Sanitize Content: removing markup from Amazon's content

- by StackOverflowNewbie

I'm using Amazon Web Service to get product descriptions of various items. The problem is that Amazon's content contains mark up that is sometimes destructive to the layout of my web page (e.g. unclosed DIVs, etc.). I want to sanitize the content I get from Amazon. My solution would be to do the following (my initial list so far): Remove unnecessary tags such as div, span, etc. while keeping tags like p, ul, ol, etc. Remove all attributes from all the tags (e.g. seems like there are style attributes in some of the tags) Remove excess white space (e.g. multiple spaces, carriage returns, new lines, tabs, etc.) Etc. Before I go off trying to build my solution, I'm wondering if anyone has a better idea (or an already existing solution). Thanks.

Read the article
Regular Expression for finding phone numbers

- by Rocky

Hello Everyone, I am new to Stackoverflow and I have a quick question. Let's assume we are given a large number of HTML files (large as in theoretically infinite). How can I use Regular Expressions to extract the list of Phone Numbers from all those files? Explanation/expression will be really appreciated. The Phone numbers can be any of the following formats: (123) 456 7899 (123).456.7899 (123)-456-7899 123-456-7899 123 456 7899 1234567899 Thanks a lot for all your help and have a good one!

Read the article
PHP regular expression

- by Ferol

such text: $text = ' href="http://yahoo.com" target="_blank"> link text </a> text... text... <br> text...'; // $text = ' text... <a href="http://yahoo.com" target="_blank"> link text </a> text... text... <br> text...'; and such regular expression: preg_match_all('/^(.*)(<.+>)(.*)(<\/.+>)(.*)$/',$text,$matches); what I want, - to check if text matches the regular expression. If yes, then $matches should contain parts of string above, if not (as I guess) it should contain four zero-length arrays. something is wrong, but I can't find, what actually is?

Read the article
Parsing two-dimensional text

- by alexbw

I need to parse text files where relevant information is often spread across multiple lines in a nonlinear way. An example: 1234 1 IN THE SUPERIOR COURT OF THE STATE OF SOME STATE 2 IN AND FOR THE COUNTY OF SOME COUNTY 3 UNLIMITED JURISDICTION 4 --o0o-- 5 6 JOHN SMITH and JILL SMITH, ) ) 7 Plaintiffs, ) ) 8 vs. ) No. 12345 ) 9 ACME CO, et al., ) ) 10 Defendants. ) ___________________________________) I need to pull out Plaintiff and Defendant identities. These transcripts have a very wide variety of formattings, so I can't always count on those nice parentheses being there, or the plaintiff and defendant information being neatly boxed off, e.g.: 1 SUPREME COURT OF THE STATE OF SOME OTHER STATE COUNTY OF COUNTYVILLE 2 First Judicial District Important Litigation 3 --------------------------------------------------X THIS DOCUMENT APPLIES TO: 4 JOHN SMITH, 5 Plaintiff, Index No. 2000-123 6 DEPOSITION 7 - against - UNDER ORAL EXAMINATION 8 OF JOHN SMITH, 9 Volume I 10 ACME CO, et al, 11 Defendants. 12 --------------------------------------------------X The two constants are: "Plaintiff" will occur after the name of the plaintiff(s), but not necessarily on the same line. Plaintiffs and defendants' names will be in upper case. Any ideas?

Read the article
Convert all first letter to upper case, rest lower for each word

- by mrblah

I have a string of text (about 5-6 words mostly) that I need to convert. Currently the text looks like: THIS IS MY TEXT RIGHT NOW I want to convert it to: This Is My Text Right Now I can loop through my collection of strings, but not sure how to go about performing this text modification.

Read the article
Fastcgi 500 error on preg_match_all in PHP

- by Bertvan

Hi, I'm trying to set up some exotic PHP code (I'm no expert), and I get a FastCGI Error 500 on a PHP line containing 'preg_match_all'. When I comment out the line, the page is returned with a 200 (but not how it was meant to be). The code is parsing php, html and javascript content loaded from the database and is composing them to return the finished page. Now, by placing around some error_log entries I could determine that the line with the preg_match_all is the cause of the 500. However the line is hit multiple times during the loading of the page and on other occasions, the line does not cause an error. Here's how it looks like exactly: preg_match_all ("/(<([\w]+)[^>]*>)((?:.|\n)*)(<\/\\2>)/", $part['data'], $tags, PREG_PATTERN_ORDER|PREG_OFFSET_CAPTURE); The subject string is a piece of text that looks like: <script> ... some javascript functions ... </script> [Edit:] This is code that is up and running correctly elsewhere, so this very well could be a PHP setting or environment difference. I'm using PHP 5.2.13 on IIS6 with FastCGI. [Edit:] Nothing is mentioned in the log files. At least not in the ones I checked: IIS Logs Event Logs PHP Log Any thoughts or direction would be welcome.

Read the article
How can I replace a plus sign in JavaScript?

- by William Calleja

I need to make a replace of a plus sign in a javascript string. there might be multiple occurrence of the plus sign so I did this up until now: myString= myString.replace(/+/g, "");# This is however breaking up my javascript and causing glitches. How do you escape a '+' sign in a regular expression?

Read the article
Python: using a regular expression to match one line of HTML

- by skylarking

This simple Python method I put together just checks to see if Tomcat is running on one of our servers. import urllib2 import re import sys def tomcat_check(): tomcat_status = urllib2.urlopen('http://10.1.1.20:7880') results = tomcat_status.read() pattern = re.compile('<body>Tomcat is running...</body>',re.M|re.DOTALL) q = pattern.search(results) if q == []: notify_us() else: print ("Tomcat appears to be running") sys.exit() If this line is not found : <body>Tomcat is running...</body> It calls : notify_us() Which uses SMTP to send an email message to myself and another admin that Tomcat is no longer runnning on the server... I have not used the re module in Python before...so I am assuming there is a better way to do this... I am also open to a more graceful solution with Beautiful Soup ... but haven't used that either.. Just trying to keep this as simple as possible...

Read the article
Regular expression to remove all text except...

- by Barryman9000

There may be an easier way, and if there is I'm all for it. However - my ASP.NET page has a TON of controls on it, and I've given them all ID's that start with underscore. I copied all the markup into Notepad++ and I'm trying to find a regular expression that will find everything but the controls and replace it with whitespace. that way I'll have a text file that has all my control names which I'll probably throw into Excel and do some string manipulation to add ".Text = " etc. Any suggestions?

Read the article
perl parentheses in regular expression

- by iamrohitbanga

I have been trying several regular expressions $str =~ s/^0+(.)/$1/; converts 0000 to 0 and 0001 to 1 $str =~ s/^0+./$1/; converts 0000 to empty string, 000100 to 00, 0001100 to 100. what difference is the parentheses making?

Read the article
Is it worth using Python's re.compile?

- by Mat

Is there any benefit in using compile for regular expressions in Python? h = re.compile('hello') h.match('hello world') vs re.match('hello', 'hello world')

Read the article
Capturing the contents of <select>

- by joey mueller

I'm trying to use a regular expression to capture the contents of all option values inside an HTML select element For example, in: <select name="test"> <option value="blah">one</option> <option value="mehh">two</option> <option value="rawr">three</option> </select> I'd like to capture one two and three into an array. My current code is var pages = responseDetails.responseText.match(/<select name="page" .+?>(?:\s*<option .+?>([^<]+)<\/option>)+\s*<\/select>/); for (var c = 0; c<pages.length; c++) { alert(pages[c]); } But it only captures the last value, in this case, "three". How can I modify this to capture all of them? Thanks!

Read the article
Regular Expression to select Hyperlink

- by Veejay

I am using the following Expression to select all hyperlinks //a[@href] How can I write an expression to select all hyperlinks which match this format http://abc.com/articles/1 here http://abc.com/articles/ is constant and the article number increases

Read the article
What regular expression would strip out all attributes from a BR tag?

- by Edward Tanguay

What C# regular expression would replace all of these: <BR style=color:#93c47d> <BR style=color:#fefefe> <BR style="color:#93c47d"> <BR style="color:#93c47d ..."> <BR> <BR/> <br style=color:#93c47d> <br style=color:#fefefe> <br style="color:#93c47d"> <br style="color:#93c47d ..."> <br> <br/> with: <br/> basically "remove all attributes from any BR element and lowercase it".

Read the article
Regexp for handling recursive arguments

- by Matt

Hi all, I'm a regexp novice, so I'm wondering what the regexp for the following: function {function arg1, arg2}, arg3 I'm looking to be able to just select the top-level arguments: {function arg1, arg2} & arg3 Ideally the response would be using preg_match in PHP, but almost any regexp would work fine. Thanks! Matt

Read the article
any excellent Python 're' tutorial

- by Tshepang

I read through the official regular expression howto which wasn't gentle enough for me. Is there anything better/easier out there?

Read the article
Regular expression not working after debugging

- by Jaison

I have an ASP.NET website with a regular expression validator text box. I have changed the expression in the regular expression validation property "validator expression" and after compiling (rebuild) and running, the validation CHANGEs are not reflecting. The previous validation is working fine but the changed validation is not working. Please help me! edit: First code: ([a-zA-Z0-9_-.]+)\@((base.co.uk)|(base.com)|(group.com)) Second code: @"([a-zA-Z0-9_\-.]+)@((base\.co\.uk)|(base\.com)|(group\.com)|(arg\.co\.uk)|(arggroup\.com))"

Read the article
bash grep finding java declarations

- by Amarsh

i have a huge .java file and i want to find all declared objects given the className. i think the declaration will always have the following signature: className objName; or className objName = or className objName= can someone suggest me a grep pattern which will find these signatures. I have the following (incomplete) : cat $rootFile | grep "$className "

Read the article
problem in defining regular expression.

- by Akhilesh thakur

How to define regular expression of mathematical expression.please define a common regular expression for 5+4 5-3 5*6 6/2

Read the article
How to Redirect Subdomains to Other Domain

- by Codex73

What I'm trying to accomplish with htaccess mod-rewrite: Redirect all sub-domains to new domain name w rewrite rule. e.g. test1.olddomain.com === test1.newdomain.com test2.olddomain.com === test2.newdomain.com test3.olddomain.com === test3.newdomain.com This is what I have so far which of course is wrong: Options +FollowSymLinks RewriteEngine on RewriteCond %{HTTP_HOST} ^olddomain\.com$ [NC] RewriteRule ^(.*)$ http://www.newdomain.com/$1 [R=301,L] RewriteCond %{HTTP_HOST} ^www\.olddomain\.com$ [NC] RewriteRule ^(.*) http://www.newdomain.com/$1 [R=301,L] RewriteRule [a-zA-Z]+\.olddomain.com$ http://$1.newdomain.com/ [R=301,L] Since I'm not a Regular Expression junkie just yet, I need your help... Thanks for any help you can give here. I know also we can compile these first two conditions into one. Note: The reason I don't redirect all domain using DNS is that a lot of directories need special rewrite rules in order to maintain positions on SEO.

Read the article
How do I locate a particular word in a text file using C#

- by cmrhema

Hi, I am sending mails (in asp.net ,c#), having a template in text file (.txt) like below User Name :<User Name> Address : <Address>. I used to replace the words within the angle brackets in the text file using the below code StreamReader sr; sr = File.OpenText(HttpContext.Current.Server.MapPath(txt)); copy = sr.ReadToEnd(); sr.Close(); //close the reader copy = copy.Replace(word.ToUpper(),"#" + word.ToUpper()); //remove the word specified UC //save new copy into existing text file FileInfo newText = new FileInfo(HttpContext.Current.Server.MapPath(txt)); StreamWriter newCopy = newText.CreateText(); newCopy.WriteLine(copy); newCopy.Write(newCopy.NewLine); newCopy.Close(); Now I have a new problem, the user will be adding new words within an angle, say for eg, they will be adding <Salary>. In that case i have to read out and find the word <Salary>. In other words, I have to find all the words, that are located with the angle brackets (<). How do I do that. Kindly do let me know. Thanks.

Read the article
Regular Expression: back references

- by sixtyfootersdude

sed 's/^\(\h*\)\(.*\)$/\1/' web.xml I think that this should take this xml: <a> <d> bla </d> </a> And turn it into:      But what is doing is this:     

Read the article
Issue with my regular expression?

- by Rubans

I'm trying to locate the number matches in a relative path for directory up references("..\"). So I have the following pattern : "(..\)" which works as expected for the path "....\a\b" where it will give me 2 successfull groups ("..\") but when I try the path "..\a\b" it will also return 2 when it should be 1. I tried this in a reg ex tool such Expresso and it seems to work as expected in there but not in in .net, any ideas?

Read the article
Automatically hyper-link URL's and Email's using C#, whilst leaving bespoke tags in place

- by marcusstarnes

I have a site that enables users to post messages to a forum. At present, if a user types a web address or email address and posts it, it's treated the same as any other piece of text. There are tools that enable the user to supply hyper-linked web and email addresses (via some bespoke tags/markup) - these are sometimes used, but not always. In addition, a bespoke 'Image' tag can also be used to reference images that are hosted on the web. My objective is to both cater for those that use these existing tools to generate hyper-linked addresses, but to also cater for those that simply type a web or email address in, and to then automatically convert this to a hyper-linked address for them (as soon as they submit their post). I've found one or two regular expressions that convert a plain string web or email address, however, I obviously don't want to perform any manipulation on addresses that are already being handled via the sites bespoke tagging, and that's where I'm stuck - how to EXCLUDE any web or email addresses that are already catered for via the bespoke tagging - I wan't to leave them as is. Here are some examples of bespoke tagging for the variations that I need to be left alone: [URL=www.msn.com]www.msn.com[/URL] [URL=http://www.msn.com]http://www.msn.com[/URL] [[email protected]][email protected][/EMAIL] [IMG]www.msn.com/images/test.jpg[/IMG] [IMG]http://www.msn.com/images/test.jpg[/IMG] The following examples would however ideally need to be automatically converted into web & email links respectively: www.msn.com http://www.msn.com [email protected] Ideally, the 'converted' links would just have the appropriate bespoke tags applied to them as per the initial examples earlier in this post, so rather than: <a href="..." etc. they'd become: [URL=http://www.. etc.) Unfortunately, we have a LOT of historic data stored with this bespoke tagging throughout, so for now, we'd like to retain that rather than implementing an entirely new way of storing our users posts. Any help would be much appreciated. Thanks.

Read the article
Regular expression only for website

- by Katie

HI, I'm new to Regular Expression. I need to find just website in some text and I'm looking for a regular expression able to find out strings like: www.my.home, http://my.site.it But this regular expression should not find strings like: [email protected] or if the website is already inside html tag <a href="http://www.my.site.com/"><span style="font-style: normal;">www.mambo-test.org</span></a> I tried with this one: \b((https?://[^ ])|(www.[^ ])) but it also finds the website in the href and between the tag: <a href="http://www.my.site.com/"><span style="font-style: normal;">www.mambo-test.org</span></a> and I don't know how except this case.

Read the article

< Previous Page | 103 104 105 106 107 108 109 110 111 112 113 114 | Next Page >