regex libraries - Page 129

Regular expressions in python unicode

- by Remy

I need to remove all the html tags from a given webpage data. I tried this using regular expressions: import urllib2 import re page = urllib2.urlopen("http://www.frugalrules.com") from bs4 import BeautifulSoup, NavigableString, Comment soup = BeautifulSoup(page) link = soup.find('link', type='application/rss+xml') print link['href'] rss = urllib2.urlopen(link['href']).read() souprss = BeautifulSoup(rss) description_tag = souprss.find_all('description') content_tag = souprss.find_all('content:encoded') print re.sub('<[^>]*>', '', content_tag) But the syntax of the re.sub is: re.sub(pattern, repl, string, count=0) So, I modified the code as (instead of the print statement above): for row in content_tag: print re.sub(ur"<[^>]*>",'',row,re.UNICODE But it gives the following error: Traceback (most recent call last): File "C:\beautifulsoup4-4.3.2\collocation.py", line 20, in <module> print re.sub(ur"<[^>]*>",'',row,re.UNICODE) File "C:\Python27\lib\re.py", line 151, in sub return _compile(pattern, flags).sub(repl, string, count) TypeError: expected string or buffer What am I doing wrong?

Read the article

regular expression and escaping

- by pstanton

Sorry if this has been asked, my search brought up many off topic posts. I'm trying to convert wildcards from a user defined search string (wildcard is "*") to postgresql like wildcard "%". I'd like to handle escaping so that "%" => "\%" and "\*" => "*" I know i could replace \* with something else prior to replacing * and then swap it back, but i'd prefer not to and instead only convert * using a pattern that selects it when not proceeded by \. String convertWildcard(String like) { like = like.replaceAll("%", "\\%"); like = like.replaceAll("\\*", "%"); return like; } Assert.assertEquals("%", convertWildcard("*")); Assert.assertEquals("\%", convertWildcard("%")); Assert.assertEquals("*", convertWildcard("\*")); // FAIL Assert.assertEquals("a%b", convertWildcard("a*b")); Assert.assertEquals("a\%b", convertWildcard("a%b")); Assert.assertEquals("a*b", convertWildcard("a\*b")); // FAIL ideas welcome.

Read the article

Regular expression to match last item without delimiter

- by Steven

I have a string of names like this "J. Smith; B. Jones; O. Henry" I can match all but the last name with \w+.*?; Is there a regular expression that will match all the names, including the last one?

Read the article

preg_replace only replaces first occurrence then skips to next line

- by Dom

Got a problem where preg_replace only replaces the first match it finds then jumps to the next line and skips the remaining parts on the same line that I also want to be replaced. What I do is that I read a CSS file that sometimes have multiple "url(media/pic.gif)" on a row and replace "media/pic.gif" (the file is then saved as a copy with the replaced parts). The content of the CSS file is put into the variable $resource_content: $resource_content = preg_replace('#(url$(\'|")?)(.*)((\'|")?$)#i', '${1}'.url::base(FALSE).'${3}'.'${4}', $resource_content); Does anyone know a solution for why it only replaces the first match per line?

Read the article

Add link around img tags with regexp

- by rdanee

I would like to add links around image tags with preg_replace(). Before: <img href="" src="" alt="" /> After: <a href="" ..><img href="" src="" alt="" /></a> I would greatly appreciate any help. Thank you very much.

Read the article

How to process this string via regular expression

- by iiduce

my string style like this: expression1/field1+expression2*expression3+expression4/field2*expression5*expression6/field3 a real style mybe like this: computer/(100)+web*mail+explorer/(200)*bbs*solution/(300) "+" and "*" represent operator "computer","web"...represent expression (100),(200) represent field num . field num may not exist. I want process the string to this: /(100)+web*+explorer/(200)bbs/(300) rules like this: if expression length is more than 3 and its field is not (200), then add brackets to it.

Read the article

Are .NET's regular expressions Turing complete?

- by Robert

Regular expressions are often pointed to as the classical example of a language that is not Turning complete. For example "regular expressions" is given in as the answer to this SO question looking for languages that are not Turing complete. In my, perhaps somewhat basic, understanding of the notion of Turning completeness, this means that regular expressions cannot be used check for patterns that are "balanced". Balanced meaning have an equal number of opening characters as closing characters. This is because to do this would require you to have some kind of state, to allow you to match the opening and closing characters. However the .NET implementation of regular expressions introduces the notion of a balanced group. This construct is designed to let you backtrack and see if a previous group was matched. This means that a .NET regular expressions: ^(?<p>a)*(?<-p>b)*(?(p)(?!))$ Could match a pattern that: ab aabb aaabbb aaaabbbb ... etc. ... Does this means .NET's regular expressions are Turing complete? Or are there other things that are missing that would be required for the language to be Turing complete?

Read the article

Problem with regular expression for some special parttern.

- by SpawnCxy

Hi all, I got a problem when I tried to find some characters with following code: preg_match_all('/[\w\uFF10-\uFF19\uFF21-\uFF3A\uFF41-\uFF5A]/',$str,$match); //line 5 print_r($match); And I got error as below: Warning: preg_match_all() [function.preg-match-all]: Compilation failed: PCRE does not support \L, \l, \N, \U, or \u at offset 4 in E:\mycake\app\webroot\re.php on line 5 I'm not so familiar with reg expression and have no idea about this error.How can I fix this?Thanks.

Read the article

How to get N random string from a {a1|a2|a3} format string?

- by Pentium10

Take this string as input: string s="planets {Sun|Mercury|Venus|Earth|Mars|Jupiter|Saturn|Uranus|Neptune}" How would I choose randomly N from the set, then join them with comma. The set is defined between {} and options are separated with | pipe. The order is maintained. Some output could be: string output1="planets Sun, Venus"; string output2="planets Neptune"; string output3="planets Earth, Saturn, Uranus, Neptune"; string output4="planets Uranus, Saturn";// bad example, order is not correct Java 1.5

Read the article

Correct syntax for matching a string inside a variable against an array

- by Jamex

Hi, I have a variable, $var, that contains a string of characters, this is a dynamic variable that contains the values from inputs. $var could be 'abc', or $var could be 'blu', I want to match the string inside variable against an array, and return all the matches. $array = array("blue", "red", "green"); What is the correct syntax for writing the code in php, my rough code is below $match = preg_grep($var, $array); (incorrect syntax of course) I tried to put quotes and escape slashes, but so far no luck. Any suggestion? TIA

Read the article

RegExp to validate a formula (string/boolean/numeric expression)?

- by JSteve

I have used regExp quit a bit of times but still far from being an expert. This time I want to validate a formula (or math expression) by regExp. The difficult part here is to validate proper starting and ending parentheses with in the formula. I believe, there would be some sample on the web but I could not find it. Can somebody post a link for such example? or help me by some other means?

Read the article

string substitution regular expression not working in tcl

- by Puneet Mittal

i am trying to replace all the special characters including white space, hyphen, etc, to underscore, from a string variable in tcl. I wrote the code below but it doesn't seem to be working. set varname $origVar puts "Variable Name :>> $varname" if {$varname != ""} { regsub -all {[\s-\]\[$^?+*()|\\%&#]} $varname "_" $newVar } puts "New Variable :>> $newVar" one issue is that, instead of replacing the string in $varname, it is replacing the data inside $origVar. No idea why, and also i read the example code (for proper syntax) in my tcl book and according to that it should be something like this regsub -all {[\s-][$^?+*()|\\%&#]} $varname "_" newVar so i used the same syntax but it didn't work and gave the same result as modifying the $origVar instead of required $varname value.

Read the article

How to avoid resetting the java Scanner position

- by Derek

I have some code that looks more or less like this: while(scanner.hasNext()) { if(scanner.findInLine("Test") !=null) { //do some things }else{ scanner.nextLine(); } } I am using this to parse an ~10MB text file. The problem is, if I put a breakpoint on the while() and the scanner.nextLine(), I can see that sometimes the scanners position (in the debug window) goes back to zero. I think this is causing me some kind of loop blow up, because the regext in findInLine() starts at zero, looks through some amount of text, advancing the position, and then it randomly gets set back to zero, so it has to re-parse all that text again. Any ideas what can be causing that? Am I even doing this the right way? Thanks Some additional info: The Scanner is instantiated from an InputStream. After diubg sine debugging, it appears that there is a HearCharBuffer that Scanner uses and it only allows 1024 characters at a time, and then resets. Is there a way to avoid this, or do things differently? That seems like a small amount of characters to be able to scan. Derek

Read the article

Regular Expression to isolate an html tag

- by orit cohen

I'm looking for a regular expression to isolate an html tag. This includes the TAG the ATTRIBUTES and the CONTNET inside. Let's say I have this: <html> <body> aajsdfkjaskd <TAGNAME name="bla" context="non">hfdfhdj </TAGNAME> </body> </html> I need a regular expression that would return: <TAGNAME name="bla" context="non">hfdfhdj </TAGNAME> Thank, Joe

Read the article

Using PHP how can i apply HTMLentities() to a backreference in a preg_replace() call?

- by Gary Willoughby

At the minute i have: $Text = preg_replace("/\[code\](.*?)\[\/code\]/s", "<mytag>\\1</mytag>", $Text); how can i escape the backreference using htmlentities()?

Read the article

Decompose a Python string into its characters

- by PoorLuzer

I want to break a Python string into its characters. sequenceOfAlphabets = list( string.uppercase ) works. However, why does not sequenceOfAlphabets = re.split( '.', string.uppercase ) work? All I get are empty, albeit expected count of elements

Read the article

How can I execute a bunch of editor commands stored in a file in VIM?

- by LES2

I have read the other posts, e.g., http://stackoverflow.com/questions/1830886/vim-executing-a-list-of-editor-commands and others. The answer isn't clear to me for my case. I have some editor commands that I generated from an SQL query. It uses :s/foo/bar to change country codes (from FIPS to a non-standard code set). Here's a sample of the file: :s/CB/CAMBO :s/CQ/NMARI :s/KV/KOSOV :s/PP/PAPUA ... I have saved that in a file called fipsToNonStd.vim (unsure about the correct extension). I want to run those commands one after another. What's the easiest way to do so? Thanks a bunch! SO Rocks!

Read the article

Regular expression match, extracting only wanted segments of string

- by Ben Carey

I am trying to extract three segments from a string. As I am not particularly good with regular expressions, I think what I have done could probably be done better... I would like to extract the bold parts of the following string: SOMETEXT: ANYTHING_HERE (Old=ANYTHING_HERE, New=ANYTHING_HERE) Some examples could be: ABC: Some_Field (Old=,New=123) ABC: Some_Field (Old=ABCde,New=1234) ABC: Some_Field (Old=Hello World,New=Bye Bye World) So the above would return the following matches: $matches[0] = 'Some_Field'; $matches[1] = ''; $matches[2] = '123'; So far I have the following code: preg_match_all('/^([a-z]*\:(\s?)+)(.+)(\s?)+$old=(.+)\,(\s?)+new=(.+)$/i',$string,$matches); The issue with the above is that it returns a match for each separate segment of the string. I do not know how to ensure the string is the correct format using a regular expression without catching and storing the match if that makes sense? So, my question, if not already clear, how I can retrieve just the segments that I want from the above string?

Read the article

Writing a PHP web crawler using cron

- by Horse

Hi all I have written myself a web crawler using simplehtmldom, and have got the crawl process working quite nicely. It crawls the start page, adds all links into a database table, sets a session pointer, and meta refreshes the page to carry onto the next page. That keeps going until it runs out of links That works fine however obviously the crawl time for larger websites is pretty tedious. I wanted to be able to speed things up a bit though, and possibly make it a cron job. Any ideas on making it as quick and efficient as possible other than setting the memory limit / execution time higher?

Read the article

Regular expression that finds and replaces non-ascii characters with Python

- by prosseek

I need to change some characters that are not ASCII to '_'. For example, Tannh‰user - Tann_huser If I use regular expression with Python, how can I do this? Is there better way to do this not using RE?

Read the article

Regular expression for dd/mm

- by swapna

Hi Please help me with a regular expression to validate the following format dd/mm This is for validating a Birthday field and the year is not required. Thanks

Read the article

How can I match a match a null byte (0x00) in the Visual Studio binary editor with a find using a re

- by Paul K

Open a file in the Visual Studio binary editor that contains a null byte (0x00), then use the Quick Find feature (Ctrl +F) to find null bytes. I would have thought I could use a regular expression such as \x00 to match null bytes but it doesn't work. Searching for any other hex value using this method works fine. Is this a VS bug, 'feature', or am I just missing something? Is there a work around?

Read the article

Trying to create a Reg Ex for the following patterns

- by Travis

Here are the patterns: Red,Green (and so on...) Red (+5.00),Green (+6.00) (and so on...) Red (+5.00,+10.00),Green (+6.00,+20.00) (and so on...) Red (+5.00),Green (and so on...) Each attribute ("Red,"Green") can have 0, 1, or 2 modifiers (shown as "+5.00,+10.00", etc.). I need to capture each of the attributes and their modifiers as a single string (i.e. "Red (+5.00,+10.00)", "Green (+6.00,+20.00)". Help?

Read the article

How to detect identical part(s) inside string?

- by Horace Ho

I try to break down the http://stackoverflow.com/questions/2711961/decoding-algorithm-wanted question into smaller questions. This is Part I. Question: two strings: s1 and s2 part of s1 is identical to part of s2 space is separator how to extract the identical part(s)? example 1: s1 = "12 November 2010 - 1 visitor" s2 = "6 July 2010 - 100 visitors" the identical parts are "2010", "-", "1" and "visitor" example 2: s1 = "Welcome, John!" s2 = "Welcome, Peter!" the identical parts are "Welcome," and "!" Python and Ruby preferred. Thanks

Read the article

re.sub emptying list

- by jmau5

def process_dialect_translation_rules(): # Read in lines from the text file specified in sys.argv[1], stripping away # excess whitespace and discarding comments (lines that start with '##'). f_lines = [line.strip() for line in open(sys.argv[1], 'r').readlines()] f_lines = filter(lambda line: not re.match(r'##', line), f_lines) # Remove any occurances of the pattern '\s*<=>\s*'. This leaves us with a # list of lists. Each 2nd level list has two elements: the value to be # translated from and the value to be translated to. Use the sub function # from the re module to get rid of those pesky asterisks. f_lines = [re.split(r'\s*<=>\s*', line) for line in f_lines] f_lines = [re.sub(r'"', '', elem) for elem in line for line in f_lines] This function should take the lines from a file and perform some operations on the lines, such as removing any lines that begin with ##. Another operation that I wish to perform is to remove the quotation marks around the words in the line. However, when the final line of this script runs, f_lines becomes an empty lines. What happened? Requested lines of original file: ## English-Geek Reversible Translation File #1 ## (Moderate Geek) ## Created by Todd WAreham, October 2009 "TV show" <=> "STAR TREK" "food" <=> "pizza" "drink" <=> "Red Bull" "computer" <=> "TRS 80" "girlfriend" <=> "significant other"

Search Results

Search found 9016 results on 361 pages for 'regex libraries'.

Page 129/361 | < Previous Page | 125 126 127 128 129 130 131 132 133 134 135 136 | Next Page >

- by Remy

- by pstanton

- by Steven

- by Dom

- by rdanee

- by iiduce

- by Robert

- by SpawnCxy

- by Pentium10

- by Jamex

- by JSteve

- by Puneet Mittal

- by Derek

- by orit cohen

- by Gary Willoughby

- by PoorLuzer

- by LES2

- by Ben Carey

- by Horse

- by prosseek

- by swapna

- by Paul K

- by Travis

- by Horace Ho

- by jmau5

< Previous Page | 125 126 127 128 129 130 131 132 133 134 135 136 | Next Page >