regex cookbook - Page 49

Regex: Find/Replace All Substrings Without a Given String Before?

- by Rodney Blythe

I need to find all strings without a given string before it. For Instance: Find: "someValue" **All results with "function(" before them should be ignored The Visual Studio regular expression would find this: value = someValue And Ignore something looking like this: function(someValue) What is the best way to go about this? Thanks for the help!

Read the article

How to do regex HTML tag replace in MS SQL?

- by timmerk

I have a table in SQL Server 2005 with hundreds of rows with HTML content. Some of the content has HTML like: <span class=heading-2>Directions</span> where "Directions" changes depending on page name. I need to change all the <span class=heading-2> and </span> tags to <h2> and </h2> tags. I wrote this query to do content changes in the past, but it doesn't work for my current problem because of the ending HTML tag: Update ContentManager Set ContentManager.Content = replace(Cast(ContentManager.Content AS NVARCHAR(Max)), 'old text', 'new text') Does anyone know how I could accomplish the span to h2 replacing purely in T-SQL? Everything I found showed I would have to do CLR integration. Thanks!

Read the article

How to regex match a string of alnums and hyphens, but which doesn't begin or end with a hyphen?

- by Shahar Evron

I have some code validating a string of 1 to 32 characters, which may contain only alpha-numerics and hyphens ('-') but may not begin or end with a hyphen. I'm using PCRE regular expressions & PHP (albeit the PHP part is not really important in this case). Right now the pseudo-code looks like this: if (match("/^[\p{L}0-9][\p{L}0-9-]{0,31}$/u", string) and not match("/-$/", string)) print "success!" That is, I'm checking first that the string is of right contents, doesn't being with a '-' and is of the right length, and then I'm running another test to see that it doesn't end with a '-'. Any suggestions on merging this into a single PCRE regular expression? I've tried using look-ahead / look-behind assertions but couldn't get it to work.

Read the article

How do I do an "OR" for my python regex?

- by alex

re.compile("abc") I would like to do "abc" OR "xyz".

Read the article

Regex for Matching First Alphanumeric Character skipping (The |An? )

- by TheLizardKing

I have a list of artists, albums and tracks that I want to sort using the first letter of their respective name. The issue arrives when I want to ignore "The ", "A ", "An " and other various non-alphanumeric characters (Talking to you "Weird Al" Yankovic and [dialog]). Django has a nice start '^(An?|The) +' but I want to ignore those and a few others of my choice. I am doing this in Django, using a MySQL db with utf8_bin collation.

Read the article

Best way to get back to using the power of lxml after having to use a regex to find something in an

- by PyNEwbie

I am trying to rip some text out of a large number of html documents (numbers in the hundreds of thousands). The documents are really forms but they are prepared by a very large group of different organizations so there is significant variation in how they create the document. For example, the documents are divided into chapters. I might want to extract the contents of Chapter 5 from every document so I can analyze the content of the chapter. Initially I thought this would be easy but it turns out that the authors might use a set of non-nested tables throughout the document to hold the content so that Chapter n could be displayed using td tags inside a table. Or they might use other elements such as p tags H tags, div tags or any other block level element. After trying repeatedly to use lxml to help me identify the beginning and end of each chapter I have determined that it is a lot cleaner to use a regular expression because in every case, no matter what the enclosing html element is the chapter label is always in the form of >Chapter # It is a little more complicated in that there might be some white space or non-breaking space represented in different ways ( or or just spaces). Nonetheless it was trivial to write a regular expression to identify the beginning of each section. (The beginning of one section is the end of the previous section.) But now I want to use lxml to get the text out. My thought is that I have really no choice but to walk along my string to find the close tag for the element that encloses the text I am using to find the relevant section. That is here is one example where the element holding the Chapter name is a div <div style="DISPLAY: block; MARGIN-LEFT: 0pt; TEXT-INDENT: 0pt; MARGIN-RIGHT: 0pt" align="left"><font style="DISPLAY: inline; FONT-WEIGHT: bold; FONT-SIZE: 10pt; FONT-FAMILY: Times New Roman">Chapter 1.   Our Beginnings.</font></div> So I am imagining that I would begin at the location where I found the match for chapter 1 and set up a regular expressions to find the next </div|</td|</p|</h1 . . . So at this point I have identified the type of element holding my chapter heading I can use the same logic to find all of the text that is within that element that is set up a regular expression to help me mark from >Chapter 1.   Our Beginnings.< So I have identified where my Chapter 1 begins I can do the same for chapter 2 (which is where Chapter 1 ends) Now I am imagining that I am going to snip the document beginning at the opening of the element that I identified as the element the indicates where chapter 1 begins and ending just before the opening of the element that I identified as the element that indicates where Chapter 2 begins. The string that I have identified will then be fed to lxml to use its power to get the content. I am going to all of this trouble because I have read over and over - never use a regular expression to extract content from html documents and I have not hit on a way to be as accurate with lxml to identify the starting and ending locations for the text I want to extract. For example, I can never be certain that the subtitle of Chapter 1 is Our Beginnings it could be Our Red Canary. Let me say that I spent two solid days trying with lxml to be confident that I had the beginning and ending elements and I could only be accurate <60% of the time but a very short regular expression has given me better than 95% success. I have a tendency to make things more complicated than necessary so I am wondering if anyone has seen or solved a similar problems and if they had an approach (not the details mind you) that they would like to offer.

Read the article

How to match a fixed number of digits with regex in PHP?

- by user198729

I want to retrieve the consecutive 8 digits out of a string. "hello world,12345678anything else" should return 12345678 as result(the space in between is optional). But this should not return anything: "hello world,123456789anything else" Because it has 9 digits,I only need 8 digits unit.

Read the article

Pattern matching; Regex; Need to replace everything after the second match

- by Davis

Ok so if I have this pattern: ab&bc&cd&de&ef And I need to replace all the ampersands except for the first one with commas so it ends up looking like this: ab&bc,cd,de,ef Its probably very simple but for the life of me I can't get this one figured out...

Read the article

How can I match at the beginning of any line, including the first, with a Perl regex?

- by JoelFan

According the the Perl documentation on regexes: By default, the "^" character is guaranteed to match only the beginning of the string ... Embedded newlines will not be matched by "^" ... You may, however, wish to treat a string as a multi-line buffer, such that the "^" will match after any newline within the string ... you can do this by using the /m modifier on the pattern match operator. The "after any newline" part means that it will only match at the beginning of the 2nd and subsequent lines. What if I want to match at the beginning of any line (1st, 2nd, etc.)? EDIT: OK, it seems that the file has BOM information (3 chars) at the beginning and that's what's messing me up. Any way to get ^ to match anyway? EDIT: So in the end it works (as long as there's no BOM), but now it seems that the Perl documentation is wrong, since it says "after any newline"

Read the article

How to validate a domain name using Regex & Php?

- by David

Hi, I want a solution to validate only domain names not full urls, The following example is what i'm looking for: domain.com -> true domain.net/org/biz... -> true domain.co.uk -> true sub.domain.com -> true domain.com/folder -> false domµ*$ain.com -> false Thank you

Read the article

How to extract part of the path and the ending file name with Regex?

- by brasofilo

I need to build an associative array with the plugin name and the language file it uses in the following sequence: /whatever/path/length/public_html/wp-content/plugins/adminimize/languages/adminimize-en_US.mo /whatever/path/length/public_html/wp-content/plugins/audio-tube/lang/atp-en_US.mo /whatever/path/length/public_html/wp-content/languages/en_US.mo /whatever/path/length/public_html/wp-content/themes/twentyeleven/languages/en_US.mo Those are the language files WordPress is loading. They are all inside /wp-content/, but with variable server paths. I'm looking only for those inside the plugins folder, grab the plugin folder name and the filename. Hipothetical case in PHP, where reg_extract_* functions are the parts I'm missing: $plugins = array(); foreach( $big_array as $item ) { $folder = reg_extract_folder( $item ); if( 'plugin' == $folder ) { // "folder-name-after-plugins-folder" $plugin_name = reg_extract_pname( $item ); // "ending-mo-file.mo" $file_name = reg_extract_fname( $item ); $plugins[] = array( 'name' => $plugin_name, 'file' => $file_name ); } } [update] Ok, so I was missing quite a basic function, pathinfo... :/ No problem to detect if /plugins/ is contained in the array. But what about the plugin folder name?

Read the article

Help with Perl Regex Recursive Replace One Liner? Replace MySQL comments '--' with '#'

- by NJTechie

I have various SQL files with '--' comments and we migrated to the latest version of MySQL and it hates these comments. I want to replace -- with #. I am looking for a recursive, inplace replace one-liner. This is what I have : perl -p -i -e 's/--/# /g' `fgrep -- -- * ` A sample .sql file : use myDB; --did you get an error I get the following error : Unrecognized switch: --did (-h will show valid options). p.s : fgrep skipping 2 dashes was just discussed here if you are interested. Any help is appreciated.

Read the article

Can I improve this regex check for valid domain names?

- by Josh

So, I have been working on this domain name regular expression. So far, it seems to pick up domain names with SLDs and TLDs (with the optional ccTLD), but there is duplication of the TLD listing. Can this be refactored any further? params[:domain_name].downcase.strip.match(/^[a-z0-9\-]{2,63} \.((a[cdefgilmnoqrstuwxz]|aero|arpa)|(b[abdefghijmnorstvwyz]|biz)| (c[acdfghiklmnorsuvxyz]|cat|com|coop)|d[ejkmoz]|(e[ceghrstu]|edu)|f[ijkmor]| (g[abdefghilmnpqrstuwy]|gov)|h[kmnrtu]|(i[delmnoqrst]|info|int)| (j[emop]|jobs)|k[eghimnprwyz]|l[abcikrstuvy]| (m[acdghklmnopqrstuvwxyz]|me|mil|mobi|museum)|(n[acefgilopruz]|name|net)|(om|org)| (p[aefghklmnrstwy]|pro)|qa|r[eouw]|s[abcdeghijklmnortvyz]| (t[cdfghjklmnoprtvwz]|travel)|u[agkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw]) (\.((a[cdefgilmnoqrstuwxz]|aero|arpa)|(b[abdefghijmnorstvwyz]|biz)| (c[acdfghiklmnorsuvxyz]|cat|com|coop)|d[ejkmoz]|(e[ceghrstu]|edu)|f[ijkmor]| (g[abdefghilmnpqrstuwy]|gov)|h[kmnrtu]|(i[delmnoqrst]|info|int)| (j[emop]|jobs)|k[eghimnprwyz]|l[abcikrstuvy]| m[acdghklmnopqrstuvwxyz]|mil|mobi|museum)| (n[acefgilopruz]|name|net)|(om|org)| (p[aefghklmnrstwy]|pro)|qa|r[eouw]|s[abcdeghijklmnortvyz]| (t[cdfghjklmnoprtvwz]|travel)|u[agkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw]))?$/)

Read the article

.NET Regex - Replace multiple characters at once without overwriting?

- by Everaldo Aguiar

I'm implementing a c# program that should automatize a Mono-alphabetic substitution cipher. The functionality i'm working on at the moment is the simplest one: The user will provide a plain text and a cipher alphabet, for example: Plain text(input): THIS IS A TEST Cipher alphabet: A - Y, H - Z, I - K, S - L, E - J, T - Q Cipher Text(output): QZKL KL QJLQ I thought of using regular expressions since I've been programming in perl for a while, but I'm encountering some problems on c#. First I would like to know if someone would have a suggestion for a regular expression that would replace all occurrence of each letter by its corresponding cipher letter (provided by user) at once and without overwriting anything. Example: In this case, user provides plaintext "TEST", and on his cipher alphabet, he wishes to have all his T's replaced with E's, E's replaced with Y and S replaced with J. My first thought was to substitute each occurrence of a letter with an individual character and then replace that character by the cipherletter corresponding to the plaintext letter provided. Using the same example word "TEST", the steps taken by the program to provide an answer would be: 1 - replace T's with (lets say) @ 2 - replace E's with # 3 - replace S's with & 4 - Replace @ with E, # with Y, & with j 5 - Output = EYJE This solution doesn't seem to work for large texts. I would like to know if anyone can think of a single regular expression that would allow me to replace each letter in a given text by its corresponding letter in a 26-letter cipher alphabet without the need of splitting the task in an intermediate step as I mentioned. If it helps visualize the process, this is a print screen of my GUI for the program:

Read the article

Can I store and call regex in variables for later use?

- by adbox

I plan on storing regular expression codes in a database, but not sure how to get them from variable to function. advise? $i = "([wx])([yz])" $j = "[^A-Za-z0-9]" $k= "([A-Z]{3}|[0-9]{4})" //Wold this execute properly, this really is the extend of my question? preg_match($i, $string);

Read the article

Regex for Password Must be contain at least 8 characters, least 1 number and both lower and uppercase letters and special characters

- by user2442653

I want a regular expression to check that Password Must be contain at least 8 characters, including at least 1 number and includes both lower and uppercase letters and special characters (e.g., #, ?, !) Cannot be your old password or contain your username, "password", or "websitename" And here is my validation expression which is for 8 characters including 1 uppercase letter, 1 lowercase letter, 1 number or special character. (?=^.{8,}$)((?=.*\d)|(?=.*\W+))(?![.\n])(?=.*[A-Z])(?=.*[a-z]).*$" How I can write it for password must be 8 characters including 1 uppercase letter, 1 special character and alphanumeric characters?

Read the article

How does this RegEx for parsing emails work in PHP?

- by George Edison

Okay, I have the following PHP code to extract an email address of the following two forms: Random Stranger <[email protected]> [email protected] Here is the PHP code: // The first example $sender = "Random Stranger <[email protected]>"; $pattern = '/([\w_-]*@[\w-\.]*)|.*<([\w_-]*@[\w-\.]*)>/'; preg_match($pattern,$sender,$matches,PREG_OFFSET_CAPTURE); echo "<pre>"; print_r($matches); echo "</pre><hr>"; // The second example $sender = "[email protected]"; preg_match($pattern,$sender,$matches,PREG_OFFSET_CAPTURE); echo "<pre>"; print_r($matches); echo "</pre>"; My question is... what is in $matches? It seems to be a strange collection of arrays. Which index holds the match from the parenthesis? How can I be sure I'm getting the email address and only the email address? Update: Here is the output: Array ( [0] => Array ( [0] => Random Stranger [1] => 0 ) [1] => Array ( [0] => [1] => -1 ) [2] => Array ( [0] => [email protected] [1] => 5 ) ) Array ( [0] => Array ( [0] => [email protected] [1] => 0 ) [1] => Array ( [0] => [email protected] [1] => 0 ) )

Read the article

How to count the Chinese word in a file using regex in perl?

- by Ivan

I tried following perl code to count the Chinese word of a file, it seems working but not get the right thing. Any help is greatly appreciated. The Error message is Use of uninitialized value $valid in concatenation (.) or string at word_counting.pl line 21, <FILE> line 21. Total things = 125, valid words = which seems to me the problem is the file format. The "total thing" is 125 that is the string number (125 lines). The strangest part is my console displayed all the individual Chinese words correctly without any problem. The utf-8 pragma is installed. #!/usr/bin/perl -w use strict; use utf8; use Encode qw(encode); use Encode::HanExtra; my $input_file = "sample_file.txt"; my ($total, $valid); my %count; open (FILE, "< $input_file") or die "Can't open $input_file: $!"; while (<FILE>) { foreach (split) { #break $_ into words, assign each to $_ in turn $total++; next if /\W|^\d+/; #strange words skip the remainder of the loop $valid++; $count{$_}++; # count each separate word stored in a hash ## next comes here ## } } print "Total things = $total, valid words = $valid\n"; foreach my $word (sort keys %count) { print "$word \t was seen \t $count{$word} \t times.\n"; } ##---Data---- sample_file.txt ??????,???????,????.??????.????:"?????????????,??????,????????.????????,?????????, ???????????.????????,???????????,??????,??????.???:`??,???????????.'?????, ??????????."??????,??????.????.???, ????????????,????,??????,?????????,??????????????. ????????,??????,???????????,????????,????????.????,????,???????, ??????????,??????,????????.??????.

Read the article

How do you replace many characters in a regex?

- by macca1

I am sanitizing an input field and manually getting and setting the caret position in the process. With some abstraction, here's the basic idea: <input type="text" onkeyup"check(this)"> And javascript... function check(element) { var charPosition = getCaretPosition(element); $(element).val( sanitize( $(element).val() ) ); setCaretPosition(element, charPosition); } function sanitize(s) { return s.replace(/[^a-zA-Z0-9\s]/g, ''); } This is working fine except when a character does actually get sanitized, my caret position is off by one. Basically I'd like a way to see if the sanitize function has actually replaced a character (and at what index) so then I can adjust the charPosition if necessary. Any ideas?

Read the article

jQuery and regex for adding icons to specific links?!

- by rayne

I'm using jQuery to add icons to specific links, e.g. a myspace icon for sites starting with http://myspace.com etc. However, I can't figure out how to use regular expressions (if that's even possible here), to make jQuery recognize the link either with or without "www." (I'm very bad at regular expressions in general). Here are two examples: $("a[href^='http://www.last.fm']").addClass("lastfm").attr("target", "_blank"); $("a[href^='http://livejournal.com']").addClass("livejournal").attr("target", "_blank"); They work fine, but I now I want the last.fm link to work with http://last.fm, http://www.last.fm and http://www.lastfm.de. Currently it only works for www.last.fm. I also would like to make the livejournal link work with subdomains links like http://username.livejournal.com How can I do that? Thanks in advance!

Read the article

Does [_\s^"] means underscore and whitespace but not " (quote) in Regex?

- by Matt

Does [_\s^"] means underscore and whitespace but not " (quote) in Reg I understand that the brackets ([ ]) mean character range and that ^ means but not, but my question is can you say [this^notthat] or do I have to seperate them into two sets of brackets?

Read the article

Regex for Eclipse/Flash Builder File Search for comments?

- by Brian Bishop

In Eclipse (and Flash/Flex Builder) you get the option with Ctrl+Shift+F to do a file search and look for a regular expression. Would be a real handy thing to know. I want to find the word negate if it appears in a Flex/java comment like the following: // It was negated because or /* The negate option was.... */ or /** * We have to negate the value */ Any ideas? Will test them out at http://www.regexplanet.com/simple/index.html

Read the article

Does [_\s^"] mean underscore and whitespace but not " (quote) in Regex?

- by Matt

Does [_\s^"] mean underscore and whitespace but not " (quote) in Reg I understand that the brackets ([ ]) mean character range and that ^ means but not, but my question is can you say [this^notthat] or do I have to separate them into two sets of brackets?

Read the article

How do I find if string has at least one character using regex?

- by Vishal

Examples: "1 name": Should say it has characters "10,000": OK "na123me": Should say it has characters "na 123, 000": Should say it has characters

Read the article

What's the RegEx to make sure that delimiters are escaped?

- by Kuyenda

I'm looking for a regular expression that will check whether or not delimiters in a string are escaped with a backward slash. The delimiters I am concerned about are comma (\,), colon (\:), semicolon (\;) and of course the backward slash itself has to be escaped (\). For example, the string "test" should return a match because there are no delimiters in it, and no escaping is necessary. The string "te\;st" would return a match because the semicolon delimiter is escaped. "te;st" and "t\;s:t" would both fail because the both contain at least one delimiter that is not escaped. I know that I need a conditional and a positive look behind, and this is what I have so far, but it is not giving me the expected answer. ^(?<delimiter>[:;,\\])?(?(delimiter)$?<=(?:\\\$*\\)k<delimiter>|.)$ Any suggestions on how I can make this work? Thanks.

Search Results

Search found 3956 results on 159 pages for 'regex cookbook'.

Page 49/159 | < Previous Page | 45 46 47 48 49 50 51 52 53 54 55 56 | Next Page >

- by Rodney Blythe

- by timmerk

- by Shahar Evron

- by alex

- by TheLizardKing

- by PyNEwbie

- by user198729

- by Davis

- by JoelFan

- by David

- by brasofilo

- by NJTechie

- by Josh

- by Everaldo Aguiar

- by adbox

- by user2442653

- by George Edison

- by Ivan

- by macca1

- by rayne

- by Matt

- by Brian Bishop

- by Matt

- by Vishal

- by Kuyenda

< Previous Page | 45 46 47 48 49 50 51 52 53 54 55 56 | Next Page >