Search Results

Search found 5919 results on 237 pages for 'regex matching'.

Page 5/237 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

C++: what regex library should I use?

- by Stéphane

I'm working on a commercial (not open source) C++ project that runs on a linux-based system. I need to do some regex within the C++ code. (I know: I now have 2 problems.) QUESTION: What libraries do people who regularly do regex from C/C++ recommend I look into? A quick search has brought the following to my attention: 1) Boost.Regex (I need to go read the Boost Software License, but this question is not about software licenses) 2) C (not C++) POSIX regex (#include <regex.h>, regcomp, regexec, etc.) 3) http://freshmeat.net/projects/cpp_regex/ (I know nothing about this one; seems to be GPL, therefore not usable on this project) Thanks.

Read the article
Racket regular-expression matching

- by Inaimathi

I'm trying to create a regex that matches the inverse of a certain string type (so, strings not ending in ".js", for example). According to the documentation, that should be the expression #rx"(?!\\.js$)", but it doesn't seem to work. To test it out, I have this function: (define (match-test regex) (map (lambda (text) (regexp-match? regex text)) '("foo.js" "bar.css" "baz.html" "mumble.gif" "foobar"))) (match-test #rx"\\.js$") returns (#t #f #f #f #f) as expected, but (match-test #rx"(?!\\.js$)") returns (#t #t #t #t #t), where I would expect (#f #t #t #t #t). What am I doing wrong, and how do I actually get a regex in Racket to express the idea "match anything which does not contain [x]"?

Read the article
Approximate string matching with a letter confusion matrix?

- by zigglenaut

I'm trying to model a phonetic recognizer that has to isolate instances of words (strings of phones) out of a long stream of phones that doesn't have gaps between each word. The stream of phones may have been poorly recognized, with letter substitutions/insertions/deletions, so I will have to do approximate string matching. However, I want the matching to be phonetically-motivated, e.g. "m" and "n" are phonetically similar, so the substitution cost of "m" for "n" should be small, compared to say, "m" and "k". So, if I'm searching for [mein] "main", it would match the letter sequence [meim] "maim" with, say, cost 0.1, whereas it would match the letter sequence [meik] "make" with, say, cost 0.7. Similarly, there are differing costs for inserting or deleting each letter. I can supply a confusion matrix that, for each letter pair (x,y), gives the cost of substituting x with y, where x and y are any letter or the empty string. I know that there are tools available that do approximate matching such as agrep, but as far as I can tell, they do not take a confusion matrix as input. That is, the cost of any insertion/substitution/deletion = 1. My question is, are there any open-source tools already available that can do approximate matching with confusion matrices, and if not, what is a good algorithm that I can implement to accomplish this?

Read the article
Search files for text matching format of a Unix directory

- by BrandonKowalski

I am attempting to search through all the files in a directory for text matching the pattern of any arbitrary directory. The output of this I hope to use to make a list of all directories referenced in the files (this part I think I can figure out on my own). I have looked at various regex resources and made my own expression that seems to work in the browser based tool but not with grep in the command line. /\w+[(/\w+)]+ My understanding so far is the above expression will look for the beginning / of a directory then look for an indeterminate number of characters before looking for a repeating block of the same thing. Any guidance would be greatly appreciated.

Read the article
Apache mod_proxy_html Substitute: how to re-use part of regex match? (regex variables?)

- by goober

Hi all, Have a unique URL-rewriting situation in Apache. I need to be able to take a URL that starts with "\u002f[X]" or '\u002f[X]" Where X is the rest of some URL, and substitute the text "\u002fmeis2\u002f[X] I'm not sure how the Regex works in Apache -- I think it's the same as Perl 5? -- but even then I'm a little unsure how this would be done. My hunch is that it has to do with Regex grouping and then using $1 to pull the variable out, but I'm entirely unfamiliar with this process in Apache. Hoping someone can help -- thanks!

Read the article
C# Find and Replace RegEx with wildcard search and addition of value

- by fraXis

Hello, The below code is from my other questions that I have asked here on SO. Everyone has been so helpful and I almost have a grasp with regards to RegEx but I ran into another hurdle. This is what I basically need to do in a nutshell. I need to take this line that is in a text file that I load into my content variable: X17.8Y-1.Z0.1G0H1E1 I need to do a wildcard search for the X value, Y value, Z value, and H value. When I am done, I need this written back to my text file (I know how to create the text file so that is not the problem). X17.8Y-1.G54G0T2 G43Z0.1H1M08 I have code that the kind users here have given me, except I need to create the T value at the end of the first line, and use the value from the H and increment it by 1 for the T value. For example: X17.8Y-1.Z0.1G0H5E1 would translate as: X17.8Y-1.G54G0T6 G43Z0.1H5M08 The T value is 6 because the H value is 5. I have code that does everything (does two RegEx functions and separates the line of code into two new lines and adds some new G values). But I don't know how to add the T value back into the first line and increment it by 1 of the H value. Here is my code: StreamReader reader = new StreamReader(fDialog.FileName.ToString()); string content = reader.ReadToEnd(); reader.Close(); content = Regex.Replace(content, @"X[-\d.]+Y[-\d.]+", "$0G54G0"); content = Regex.Replace(content, @"(Z(?:\d*\.)?\d+)[^H]*G0(H(?:\d*\.)?\d+)\w*", "\nG43$1$2M08"); //This must be created on a new line This code works great at taking: X17.8Y-1.Z0.1G0H5E1 and turning it into: X17.8Y-1.G54G0 G43Z0.1H5M08 but I need it turned into this: X17.8Y-1.G54G0T6 G43Z0.1H5M08 (notice the T value is added to the first line, which is the H value +1 (T = H + 1). Can someone please modify my RegEx statement so I can do this automatically? I tried to combine my two RegEx statements into one line but I failed miserably. Thanks so much, Shawn

Read the article
How does MatchEvaluator works? ( C# regex replace)

- by Marin Doric

This is the input string 23x * y34x2. I want to insert " * " (star surrounded by whitespaces) after every number followed by letter, and after every letter followed by number. So my input string would look like this: 23 * x * y * 34 * x * 2. This is the regex that does the job: @"\d(?=[a-z])|a-z". This is the function that I wrote that inserts the " * ". Regex reg = new Regex(@"\d(?=[a-z])|[a-z](?=\d)"); MatchCollection matchC; matchC = reg.Matches(input); int ii = 1; foreach (Match element in matchC)//foreach match I will find the index of that match { input = input.Insert(element.Index + ii, " * ");//since I' am inserting " * " ( 3 characters ) ii += 3; //I must increment index by 3 } return input; //return modified input My question how to do same job using .net MatchEvaluator? I'am new to regex and don't understand good replacing with MatchEvaluator. This is the code that I tried to wrote: Regex reg = new Regex(@"\d(?=[a-z])|[a-z](?=\d)"); MatchEvaluator matchEval = new MatchEvaluator(ReplaceStar); input = reg.Replace(input, matchEval); return input; } public string ReplaceStar( Match match ) { //return What?? }

Read the article
Infinite loop in regex in java

- by carpediem

Hello, My purpose is to match this kind of different urls: url.com my.url.com my.extended.url.com a.super.extended.url.com and so on... So, I decided to build the regex to have a letter or a number at start and end of the url, and to have a infinite number of "subdomains" with alphanumeric characters and a dot. For example, in "my.extended.url.com", "m" from "my" is the first class of the regex, "m" from "com" is the last class of the regex, and "y.", "extended." and "url." are the second class of the regex. Using the pattern and subject in the code below, I want the find method to return me a false because this url must not match, but it uses 100% of CPU and seems to stay in an infinite loop. String subject = "www.association-belgo-palestinienne-be"; Pattern pattern = Pattern.compile("^[A-Za-z0-9]\\.?([A-Za-z0-9_-]+\\.?)*[A-Za-z0-9]\\.[A-Za-z]{2,6}"); Matcher m = pattern.matcher(subject); System.out.println(" Start"); boolean hasFind = m.find(); System.out.println(" Finish : " + hasFind); Which only prints: Start I can't reproduce the problem using regex testers. Is it normal ? Is the problem coming from my regex ? Could it be due to my Java version (1.6.0_22-b04 / JVM 64 bit 17.1-b03) ? Thanks in advance for helping.

Read the article
Regex with optional part doesn't create backreference

- by padraigf

I want to match an optional tag at the end of a line of text. Example input text: The quick brown fox jumps over the lazy dog {tag} I want to match the part in curly-braces and create a back-reference to it. My regex looks like this: ^.*(\{\w+\})? (somewhat simplified, I'm also matching parts before the tag): It matches the lines ok (with and without the tag) but doesn't create a back-reference to the tag. If I remove the '?' character, so regex is: ^.*(\{\w+\}) It creates a back-reference to the tag but then doesn't match lines without the tag. I understood from http://www.regular-expressions.info/refadv.html that the optional operator wouldn't affect the backreference: Round brackets group the regex between them. They capture the text matched by the regex inside them that can be reused in a backreference, and they allow you to apply regex operators to the entire grouped regex. but must've misunderstood something. How do I make the tag part optional and create a back-reference when it exists?

Read the article
Automatically generating Regex from set of strings residing in DB C#

- by Muhammad Adeel Zahid

Hello Everyone i have about 100,000 strings in database and i want to if there is a way to automatically generate regex pattern from these strings. all of them are alphabetic strings and use set of alphabets from English letters. (X,W,V) is not used for example. is there any function or library that can help me achieve this target in C#. Example Strings are KHTK RAZ given these two strings my target is to generate a regex that allows patterns like (k, kh, kht,khtk, r, ra, raz ) case insensitive of course. i have downloaded and used some C# applications that help in generating regex but that is not useful in my scenario because i want a process in which i sequentially read strings from db and add rules to regex so this regex could be reused later in the application or saved on the disk. i m new to regex patterns and don't know if the thing i m asking is even possible or not. if it is not possible please suggest me some alternate approach. Any help and suggestions are highly appreciated. regards Adeel Zahid

Read the article
Can I write this regex in one step?

- by Marin Doric

This is the input string "23x +y-34 x + y+21x - 3y2-3x-y+2". I want to surround every '+' and '-' character with whitespaces but only if they are not allready sourrounded from left or right side. So my input string would look like this "23x + y - 34 x + y + 21x - 3y2 - 3x - y + 2". I wrote this code that does the job: Regex reg1 = new Regex(@"\+(?! )|\-(?! )"); input = reg1.Replace(input, delegate(Match m) { return m.Value + " "; }); Regex reg2 = new Regex(@"(?<! )\+|(?<! )\-"); input = reg2.Replace(input, delegate(Match m) { return " " + m.Value; }); explanation: reg1 // Match '+' followed by any character not ' ' (whitespace) or same thing for '-' reg2 // Same thing only that I match '+' or '-' not preceding by ' '(whitespace) delegate 1 and 2 just insert " " before and after m.Value ( match value ) Question is, is there a way to create just one regex and just one delegate? i.e. do this job in one step? I am a new to regex and I want to learn efficient way.

Read the article
Find and Replace RegEx with wildcard search and addition of value

- by fraXis

The below code is from my other questions that I have asked here on SO. Everyone has been so helpful and I almost have a grasp with regards to RegEx but I ran into another hurdle. This is what I basically need to do in a nutshell. I need to take this line that is in a text file that I load into my content variable: X17.8Y-1.Z0.1G0H1E1 I need to do a wildcard search for the X value, Y value, Z value, and H value. When I am done, I need this written back to my text file (I know how to create the text file so that is not the problem). X17.8Y-1.G54G0T2 G43Z0.1H1M08 I have code that the kind users here have given me, except I need to create the T value at the end of the first line, and use the value from the H and increment it by 1 for the T value. For example: X17.8Y-1.Z0.1G0H5E1 would translate as: X17.8Y-1.G54G0T6 G43Z0.1H5M08 The T value is 6 because the H value is 5. I have code that does everything (does two RegEx functions and separates the line of code into two new lines and adds some new G values). But I don't know how to add the T value back into the first line and increment it by 1 of the H value. Here is my code: StreamReader reader = new StreamReader(fDialog.FileName.ToString()); string content = reader.ReadToEnd(); reader.Close(); content = Regex.Replace(content, @"X[-\d.]+Y[-\d.]+", "$0G54G0"); content = Regex.Replace(content, @"(Z(?:\d*\.)?\d+)[^H]*G0(H(?:\d*\.)?\d+)\w*", "\nG43$1$2M08"); //This must be created on a new line This code works great at taking: X17.8Y-1.Z0.1G0H5E1 and turning it into: X17.8Y-1.G54G0 G43Z0.1H5M08 but I need it turned into this: X17.8Y-1.G54G0T6 G43Z0.1H5M08 (notice the T value is added to the first line, which is the H value +1 (T = H + 1). Can someone please modify my RegEx statement so I can do this automatically? I tried to combine my two RegEx statements into one line but I failed miserably.

Read the article
.NET regex: Match.nextMatch() never returns

- by Jimmy

I have a regex that seems to have worked fine for the past year or so, and all of a sudden today with a new slightly different text to match against, Match.nextMatch() never returns. I'm no regex expert and I'm sure the regex can be optimized, but previous data sets weren't much more complex than what I've tried today. Furthermore, the regex works fine against the offending data set in a tool like RegexBuddy; it's only in .net (running in debug in Visual Studio) that it seems to hang. Nevertheless, if anyone can figure out how to tweak the regex to make it work, I'd really appreciate it. This is the regex: <tr>(<td[^>]*><a[^>]*>(?<callOptionTicker>[A-Z]{1,5}\d{6}C\d{8})</a></td>)(<td[^>]*>.*?</td>){6}(<td[^>]*><b><a[^>]*>(?<strikePrice>\d*\.\d*)</a></b></td>)(<td[^>]*><a[^>]*>(?<putOptionTicker>[A-Z]{1,5}\d{6}P\d{8})</a></td>) It's meant to extract put and call option tickers from a Yahoo option chain page (i.e., raw HTML). It works fine for IBM http://finance.yahoo.com/q/os?s=IBM&m=2010-05-21 It doesn't work for SPX options (this is the offending data set) http://finance.yahoo.com/q/os?s=I:SPX.W&m=2010-05

Read the article
Match exactly (and only) the pattern I specify in a grep command

- by palswim

Usually, grep searches for all lines containing a match for the pattern/parameter I specify. I would like to match just the pattern (i.e. not the whole line). So, if a file contains the lines: We said that we'll come. Unfortunately, we were delayed. Now, we're on our way. I want to find all contractions starting with "we" (regex pattern: we\'[a-z]+/i). And I'm looking for the output: we'll we're How do I do this (with grep or another Unix/Windows command-line tool)?

Read the article
How to do regex HTML tag replace in SQL Server?

- by timmerk

I have a table in SQL Server 2005 with hundreds of rows with HTML content. Some of the content has HTML like: <span class=heading-2>Directions</span> where "Directions" changes depending on page name. I need to change all the <span class=heading-2> and </span> tags to <h2> and </h2> tags. I wrote this query to do content changes in the past, but it doesn't work for my current problem because of the ending HTML tag: Update ContentManager Set ContentManager.Content = replace(Cast(ContentManager.Content AS NVARCHAR(Max)), 'old text', 'new text') Does anyone know how I could accomplish the span to h2 replacing purely in T-SQL? Everything I found showed I would have to do CLR integration. Thanks!

Read the article
Algorithm for multiple word matching in a text, count the number of every matched word

- by 66

I have noticed that it has solutions for matching multiple words in a given text, such as below: http://stackoverflow.com/questions/1099985/algorithm-for-multiple-word-matching-in-text If I want to know exactly the number of appearances of each matched word in the text, my solution is like this: step 1: using ac-algorithm to obtain the maching words; step 2: count the number of each word obtained in step 1 is there a faster way? Thx~

Read the article
Regex: markdown-style link matching

- by The.Anti.9

I want to parse markdown style links, but I'm having some trouble matching the reference style ones. Like this one: [id]: http://example.com/ "Optional Title Here" My regex gets the id and the url, but not the title. Heres what I have: /\[([a-zA-Z0-9_-]+)\]: (\S+)\s?("".*?"")?/ I go through and add the references to a hashtable. the id as the key and the value is an instance of a class I made called LinkReference that just contains the url and the title. In case the problem is not my regex, and my code adding the matches to the hash table, Heres my code for that too: Regex rx = new Regex(@"\[([a-zA-Z0-9_-]+)\]: (\S+)\s?("".*?"")?"); MatchCollection matches = rx.Matches(InputText); foreach (Match match in matches) { GroupCollection groups = match.Groups; string title = null; try { title = groups[3].Value; } catch (Exception) { // keep title null } LinkReferences.Add(groups[1].Value, new LinkReference(groups[2].Value, title)); }

Read the article
matching certain numbers at the end of a string

- by user697473

I have a vector of strings: s <- c('abc1', 'abc2', 'abc3', 'abc11', 'abc12', 'abcde1', 'abcde2', 'abcde3', 'abcde11', 'abcde12', 'nonsense') I would like a regular expression to match only the strings that begin with abc and end with 3, 11, or 12. In other words, the regex has to exclude abc1 but not abc11, abc2 but not abc12, and so on. I thought that this would be easy to do with lookahead assertions, but I haven't found a way. Is there one? EDIT: Thanks to posters below for pointing out a serious ambiguity in the original post. In reality, I have many strings. They all end in digits: some in 0, some in 9, some in the digits in between. I am looking for a regex that will match all strings except those that end with a letter followed by a 1 or a 2. (The regex should also match only those strings that start with abc, but that's an easy problem.) I tried to use negative lookahead assertions to create such a regex. But I didn't have any success.

Read the article
Aligning music notes using String matching algorithms or Dynamic Programming

- by Dolphin

Hi I need to compare 2 sets of musical pieces (i.e. a playing-taken in MIDI format-note details extracted and saved in a database table, against sheet music-taken into XML format). When evaluating playing against sheet music (i.e.note details-pitch, duration, rhythm), note alignment needs to be done - to identify missed/extra/incorrect/swapped notes that from the reference (sheet music) notes. I have like 1800-2500 notes in one piece approx (can even be more-with polyphonic, right now I'm doing for monophonic). So will I have to have all these into an array? Will it be memory overloading or stack overflow? There are string matching algorithms like KMP, Boyce-Moore. But note alignment can also be done through Dynamic Programming. How can I use Dynamic Programming to approach this? What are the available algorithms? Is it about approximate string matching? Which approach is much productive? String matching algos like Boyce-Moore, or dynamic programming? How can I assess which is more effective? Greatly appreciate any insight or suggestions Thanks in advance

Read the article
Batch-Renaming Movies using Regex

- by Nate Mara

So, I've been trying to rename some movie files using regular expressions, but so far I have been only marginally successful. The goal is to parse files like this: 2001.A.Space.Odyssey.1968.720p.BluRay.DD5.1.x264-LiNG.mkv And rename them Like this: 2001 A Space Odyssey (1968).mkv I created the pattern: ^(.+).(\d{4}).+.(mp4|avi|mkv)$ With the output: \1 (\2).\3 Now, this works perfectly fine when I have movies with one-word titles, but when there is more than one word separated by a period, the regex fails to grab anything. What am I doing wrong here?

Read the article
test filenames for regex patterns in bash

- by rk

I'm not sure exactly how the code should be written but I want to test a file/folder for naming patterns, something like: if [ -d $i ] && [ regex([0-9].,$i) { do something } I want it to check if the file/folder is a directory and that the name of it is a number (i.e. 1 or 101 or 10007)...

Read the article
Regex in Notepad++ 6

- by Rocket

So, Notepad++ got updated to v6.0. One of their new features is PCRE (Perl Compatible Regular Expressions). I tried to use this new feature to find and replace things in a file. I tried the regular expression: {\$([a-zA-Z_]*)} and it yelled at me, saying "Invalid regular expression". I tested this regex in other programs (like my main IDE, Geany), and it worked fine. Why does this not work in Notepad++ 6.0?

Read the article
Regex split string but keep separators

- by rwwilden

I'd like to do a Regex.Split on some separators but I'd like to keep the separators. To give an example of what I'm trying: "abc[s1]def[s2][s3]ghi" --> "abc", "[s1]", "def", "[s2]", "[s3]", "ghi" The regular expression I've come up with is new Regex("\\[|\\]|\\]\\["). However, this gives me the following: "abc[s1]def[s2][s3]ghi" --> "abc", "s1", "def", "s2", "", "s3", "ghi" The separators have disappeared (which makes sense given my regex). Is there a way to write the regex so that the separators themselves are preserved?

Read the article
Regex to validate JSON

- by Shard

I am looking for a Regex that allows me to validate json. I am very new to Regex's and i know enough that parsing with Regex is bad but can it be used to validate?

Read the article
Regex.Replace only replaces start of string

- by Yannick Smits

I'm trying to replace a friendly url pattern with a html url notation but due to lack of regex experience I can't figure out why my regex only replaces the first occurence of my pattern: string text = "[Hotel Des Terrasses \http://flash-hotel.fr/] and [Du Phare \http://www.activehotels.com/hotel/]"; text = Regex.Replace(text, @"\[(.+)\s*\\(.+)\]", "<a href=\"$2\" target=\"_blank\">$1</a>"); How can i make the second pattern be replaced with the HTML markup too?

Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >