Search Results

Search found 5919 results on 237 pages for 'regex matching'.

Page 48/237 | < Previous Page | 44 45 46 47 48 49 50 51 52 53 54 55  | Next Page >

  • Are there free, low cost, or open source tools for matching name/address data?

    - by luiscolorado
    This question is related to Tools for matching name/address data. There is a number commercial tools provided by SAS, Oracle, Microsoft, etc., that allow to de-duplicate or merging names of individuals or companies coming from multiple sources. However, after reading the answers to the question mentioned before, I wondered why a seemingly interesting problem didn't receive any answers mentioning open source projects that could tackle the problem. Are you aware of any open source projects or algorithms to implement the so called "record linking", "record merging", or "clustering"?

    Read the article

  • Double interpolation of regular expressions in Perl

    - by tomdee
    I have a Perl program that stores regular expressions in configuration files. They are in the form: regex = ^/d+$ Elsewhere, the regex gets parsed from the file and stored in a variable - $regex. I then use the variable when checking the regex, e.g. $lValid = ($valuetocheck =~ /$regex/); I want to be able to include perl variables in the config file, e.g. regex = ^\d+$stored_regex$ But I can't work out how to do it. When regular expressions are parsed by Perl they get interpreted twice. First the variables are expanded, and then the the regular expression itself is parsed. What I need is a three stage process: First interpolate $regex, then interpolate the variables it contains and then parse the resulting regular expression. Both the first two interpolations need to be "regular expression aware". e.g. they should know that the string contain $ as an anchor etc... Any ideas?

    Read the article

  • regex for matching strings that have illegal filename characters.

    - by cchampion
    I been trying to figure out how this blasted regex for two hours!!! It's midnight I gotta figure this out and go to bed!!! String str = new String("filename\\"); if(str.matches(".*[?/<>|*:\"{\\}].*")) { System.out.println("match"); }else { System.out.println("no match"); } ".*[?/<>|*:\"{\\}].*" is my regex expression. It catches everything correctly except the backslash!!! I need to know how to make it catch the backslash correctly please help! FYI, the illegal characters i'm trying to catch are ? \ / < | * : " I've got it working exception for the backslash

    Read the article

  • String.split() - matching leading empty String prior to first delimiter?

    - by tehblanx
    I need to be able to split an input String by commas, semi-colons or white-space (or a mix of the three). I would also like to treat multiple consecutive delimiters in the input as a single delimiter. Here's what I have so far: String regex = "[,;\\s]+"; return input.split(regex); This works, except for when the input string starts with one of the delimiter characters, in which case the first element of the result array is an empty String. I do not want my result to have empty Strings, so that something like, ",,,,ZERO; , ;;ONE ,TWO;," returns just a three element array containing the capitalized Strings. Is there a better way to do this than stripping out any leading characters that match my reg-ex prior to invoking String.split? Thanks in advance!

    Read the article

  • PHP - REGEX - use string for pattern but exclude it from being removed!

    - by aSeptik
    Hi All guys! i'm pretty new on regex, i have learned something by the way, but is still pour knowledge! so i want ask you for clarification on how it work! assuming i have the following strings, as you can see they can be formatted little different way one from another but they are very similar! DTSTART;TZID="America/Chicago":20030819T000000 DTEND;TZID="America/Chicago":20030819T010000 DTSTART;TZID=US/Pacific DTSTART;VALUE=DATE now i want replace everything between the first A-Z block and the colon so for example i would keep DTSTART:20030819T000000 DTEND:20030819T010000 DTSTART DTSTART so on my very noobs knowledge i have worked out this shitty regex! :-( preg_replace( '/^[A-Z](?!;[A-Z]=[\w\W]+):$/m' , '' , $data ); but why i'm sure this regex will not work!? :-) Pls help me! PS: the title of question is pretty explaned, i want also know how for example use a well know string block for match another... preg_replace( '/^[DTSTART](?!;[A-Z]=[\w\W]+):$/m' , '' , $data ); ..without delete DTSTART Thanks for the time! Regards Luca Filosofi

    Read the article

  • How do I ensure that a regex does not match an empty string?

    - by Dancrumb
    I'm using the Jison parser generator for Javascript and am having problems with my language specification. The program I'm writing will be a calculator that can handle feet, inches and sixteenths. In order to do this, I have the following specification: %% ([0-9]+\s*"'")?\s*([0-9]+\s*"\"")?\s*([0-9]+\s*"s")? {return 'FIS';} [0-9]+("."[0-9]+)?\b {return 'NUMBER';} \s+ {/* skip whitespace */} "*" {return '*';} "/" {return '/';} "-" {return '-';} "+" {return '+';} "(" {return '(';} ")" {return ')';} <<EOF>> {return 'EOF';} Most of these lines come from a basic calculator specification. I simply added the first line. The regex correctly matches feet, inch, sixteenths, such as 6'4" (six feet, 4 inches) or 4"5s (4 inches, 5 sixteenths) with any kind of whitespace between the numbers and indicators. The problem is that the regex also matches a null string. As a result, the lexical analysis always records a FIS at the start of the line and then the parsing fails. Here is my question: is there a way to modify this regex to guarantee that it will only match a non-zero length string?

    Read the article

  • Is there a faster way to parse through a large file with regex quickly?

    - by Ray Eatmon
    Problem: Very very, large file I need to parse line by line to get 3 values from each line. Everything works but it takes a long time to parse through the whole file. Is it possible to do this within seconds? Typical time its taking is between 1 minute and 2 minutes. Example file size is 148,208KB I am using regex to parse through every line: Here is my c# code: private static void ReadTheLines(int max, Responder rp, string inputFile) { List<int> rate = new List<int>(); double counter = 1; try { using (var sr = new StreamReader(inputFile, Encoding.UTF8, true, 1024)) { string line; Console.WriteLine("Reading...."); while ((line = sr.ReadLine()) != null) { if (counter <= max) { counter++; rate = rp.GetRateLine(line); } else if(max == 0) { counter++; rate = rp.GetRateLine(line); } } rp.GetRate(rate); Console.ReadLine(); } } catch (Exception e) { Console.WriteLine("The file could not be read:"); Console.WriteLine(e.Message); } } Here is my regex: public List<int> GetRateLine(string justALine) { const string reg = @"^\d{1,}.+\[(.*)\s[\-]\d{1,}].+GET.*HTTP.*\d{3}[\s](\d{1,})[\s](\d{1,})$"; Match match = Regex.Match(justALine, reg, RegexOptions.IgnoreCase); // Here we check the Match instance. if (match.Success) { // Finally, we get the Group value and display it. string theRate = match.Groups[3].Value; Ratestorage.Add(Convert.ToInt32(theRate)); } else { Ratestorage.Add(0); } return Ratestorage; } Here is an example line to parse, usually around 200,000 lines: 10.10.10.10 - - [27/Nov/2002:16:46:20 -0500] "GET /solr/ HTTP/1.1" 200 4926 789

    Read the article

  • New regular expression features in PCRE 8.34 and 8.35

    - by Jan Goyvaerts
    PCRE 8.34 adds some new regex features and changes the behavior of a few to make it better compatible with the latest versions of Perl. There are no changes to the regex syntax in PCRE 8.35. \o{377} is now an octal escape just like \377. This syntax was first introduced in Perl 5.12. It avoids any confusion between octal escapes and backreferences. It also allows octal numbers beyond 377 to be used. E.g. \o{400} is the same as \x{100}. If you have any reason to use octal escapes instead of hexadecimal escapes then you should definitely use the new syntax. Because of this change, \o is now an error when it doesn’t form a valid octal escape. Previously \o was a literal o and \o{377} was a sequence of 337 o‘s. In free-spacing mode, whitespace between a quantifier and the ? that makes it lazy or the + that makes it possessive is now ignored. In Perl this has always been the case. In PCRE 8.33 and prior, whitespace ended a quantifier and any following ? or + was seen as a second quantifier and thus an error. The shorthand \s now matches the vertical tab character in addition to the other whitespace characters it previously matched. Perl 5.18 made the same change. Many other regex flavors have always included the vertical tab in \s, just like POSIX has always included it in [[:space:]]. Names of capturing groups are no longer allowed to start with a digit. This has always been the case in Perl since named groups were added to Perl 5.10. PCRE 8.33 and prior even allowed group names to consist entirely of digits. [[:<:]] and [[::]] are now treated as POSIX-style word boundaries. They match at the start and the end of a word. Though they use similar syntax, these have nothing to do with POSIX character classes and cannot be used inside character classes. Perl does not support POSIX word boundaries. The same changes affect PHP 5.5.10 (and later) and R 3.0.3 (and later) as they have been updated to use PCRE 8.34. RegexBuddy and RegexMagic have been updated to support the latest versions of PCRE, PHP, and R. Older versions that were previously supported are still supported, so you can compare or convert your regular expressions between the latest versions of PCRE, PHP, and R and whichever version you were using previously.

    Read the article

  • Inspiring problems to show off the importance of regular expressions?

    - by ragu.pattabi
    I am planning to give a presentation/demonstration on regular expressions at work to encourage young developers to add this powerful and important tool in their toolbox. Just teaching syntax doesn't cut it. I often see people say nice. After the presentation, they get on with their programming lives without ever thinking of using it mostly. I am raking my grey matter to come up with some solid examples, not just problems that matches 'cat' and 'cut'. I missed to note down the occasions of my regex enlightenments to use here. :^( Do you have some inspiring problems to share that could be solved with regex?

    Read the article

  • Using RegEx's in Multi-Channel Funnels in Google Analytics

    - by Rob H
    For some reason, I can't get my multi-channel funnel which utilizes RegEx's in the path steps to function -- it keeps coming back with no data. There are a few variables which may be holding things up, but I can't figure out the origin of the problem, nor a solution. Here's the situation: The funnel is tracking conversions, defined as when a user completes 4 steps to signup Steps are not "required" Default URL is set to https://example.com There is a 302 redirect set up on our site that leads from http://example.com to https://example.com Within the funnel, steps switch from non-secure pages (unless browser is set to secure browsing), to secure pages once the user moves from the landing page to the second page of the sign-up process (account placeholder has been created) URL at that point contains the variable of publisher number within (but not at the end) the URL My RegEx's are all properly written as tested on rubular.com

    Read the article

  • Regular expression help

    - by DJPB
    I there I'm working on a C# app, and I get a string with a date or part of a date and i need to take day, month and year for that string ex: string example='31-12-2010' string day = Regex.Match(example, "REGULAR EXPRESSION FOR DAY").ToString(); string month = Regex.Match(example, "REGULAR EXPRESSION FOR MONTH").ToString() string year = Regex.Match(example, "REGULAR EXPRESSION FOR YEAR").ToString() day = "31" month = "12" year = "2010" ex2: string example='12-2010' string month = Regex.Match(example, "REGULAR EXPRESSION FOR MONTH").ToString() string year = Regex.Match(example, "REGULAR EXPRESSION FOR YEAR").ToString() month = "12" year = "2010" any idea? tks

    Read the article

  • How does Haskell do pattern matching without us defining an Eq on our data types?

    - by devoured elysium
    I have defined a binary tree: data Tree = Null | Node Tree Int Tree and have implemented a function that'll yield the sum of the values of all its nodes: sumOfValues :: Tree -> Int sumOfValues Null = 0 sumOfValues (Node Null v Null) = v sumOfValues (Node Null v t2) = v + (sumOfValues t2) sumOfValues (Node t1 v Null) = v + (sumOfValues t1) sumOfValues (Node t1 v t2) = v + (sumOfValues t1) + (sumOfValues t2) It works as expected. I had the idea of also trying to implement it using guards: sumOfValues2 :: Tree -> Int sumOfValues2 Null = 0 sumOfValues2 (Node t1 v t2) | t1 == Null && t2 == Null = v | t1 == Null = v + (sumOfValues2 t2) | t2 == Null = v + (sumOfValues2 t1) | otherwise = v + (sumOfValues2 t1) + (sumOfValues2 t2) but this one doesn't work because I haven't implemented Eq, I believe: No instance for (Eq Tree) arising from a use of `==' at zzz3.hs:13:3-12 Possible fix: add an instance declaration for (Eq Tree) In the first argument of `(&&)', namely `t1 == Null' In the expression: t1 == Null && t2 == Null In a stmt of a pattern guard for the definition of `sumOfValues2': t1 == Null && t2 == Null The question that has to be made, then, is how can Haskell make pattern matching without knowing when a passed argument matches, without resorting to Eq?

    Read the article

  • How to find a word within text using XSLT 2.0 and REGEX (which doesn't have \b word boundary)?

    - by Mads Hansen
    I am attempting to scan a string of words and look for the presence of a particular word(case insensitive) in an XSLT 2.0 stylesheet using REGEX. I have a list of words that I wish to iterate over and determine whether or not they exist within a given string. I want to match on a word anywhere within the given text, but I do not want to match within a word (i.e. A search for foo should not match on "food" and a search for bar should not match on "rebar"). XSLT 2.0 REGEX does not have a word boundary(\b), so I need to replicate it as best I can.

    Read the article

  • Is there an easy way to get a list of all successful captures from a regex pre-5.10?

    - by Chas. Owens
    I know the right way to do this if I have Perl 5.10 is to use named captures and values %+, but in Perl 5.8.9 and how can I get a list of successful captures? I have come up with two methods that are both just terrible: #you need to list each possible match my @captures = grep { defined } ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16); and #ew, I turned on symbolic references { no strict 'refs'; my @captures = map { defined $+[$_] ? $$_ : () } 1 .. $#+; } There is a third option I have found involving (?{}), but it requires global variables (because the closure happens at compile time) and takes the regex from reasonably clear to ungodly mess. The only alternative I have found is to capture the whole match and then use another set of regexes to get the values I want (actually I build the first regex out of the other regexes because there is no good reason to duplicate the logic).

    Read the article

  • Better way to write this regex to match multi-ordered property list?

    - by Andrew Philips
    I've been whacking on this regex for a while, trying to build something that can pick out multiple ordered property values (DTSTART, DTEND, SUMMARY) from an .ics file. I have other options (like reading one line at a time and scanning), but wanted to build a single regex that can handle the whole thing. SAMPLE PERL # There has got to be a better way... my $x1 = '(?:^DTSTART[^\:]*:(?<dts>.*?)$)'; my $x2 = '(?:^DTEND[^\:]*:(?<dte>.*?)$)'; my $x3 = '(?:^SUMMARY[^\:]*:(?<dtn>.*?)$)'; my $fmt = "$x1.*$x2.*$x3|$x1.*$x3.*$x2|$x2.*$x1.*$x3|$x2.*$x3.*$x1|$x3.*$x1.*$x2|$x3.*$x2.*$x1"; if ($evts[1] =~ /$fmt/smo) { printf "lines:\n==>\n%s\n==>\n%s\n==>\n%s\n", $+{dts}, $+{dte}, $+{dtn}; } else { print "Failed.\n"; } SAMPLE DATA BEGIN:VEVENT UID:0A5ECBC3-CAFB-4CCE-91E3-247DF6C6652A TRANSP:OPAQUE SUMMARY:Gandalf_flinger1 DTEND:20071127T170005 DTSTART,lang=en_us:20071127T103000 DTSTAMP:20100325T003424Z X-APPLE-EWS-BUSYSTATUS:BUSY SEQUENCE:0 END:VEVENT SAMPLE OUTPUT lines: == 20071127T103000 == 20071127T170005 == Gandalf_flinger1

    Read the article

  • Regular Expressions: RegEx for determining valid PHP class property names?

    - by Brian Lacy
    I am using PHP's magic __set and __get methods to access a private array in a class. I want to make sure the property names requested (i.e. $myObj->FakeProperty) are valid according to PHP property name rules before accessing them. My current RegEx isn't doing the trick; with my test values, _12 always falls through the cracks. I'm not actually sure that my test values even represent a realistic representation of what is and isn't allowed for PHP class property names, but I'm not really too concerned about it, just that I have some sort of rudimentary check in place. Test Fields: albert12 12Albert _12 _Albert12 _12Albert _____a_1 RegEx: ^(?=_*[A-z]+)[A-z0-9_]+$

    Read the article

  • What Regex can strip e.g. "note:" and "firstName: " from the left of a string?

    - by Edward Tanguay
    I need to strip the "label" off the front of strings, e.g. note: this is a note needs to return: note and this is a note I've produced the following code example but am having trouble with the regexes. What code do I need in the two ???????? areas below so that I get the desired results shown in the comments? using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Text.RegularExpressions; namespace TestRegex8822 { class Program { static void Main(string[] args) { List<string> lines = new List<string>(); lines.Add("note: this is a note"); lines.Add("test: just a test"); lines.Add("test:\t\t\tjust a test"); lines.Add("firstName: Jim"); //"firstName" IS a label because it does NOT contain a space lines.Add("She said this to him: follow me."); //this is NOT a label since there is a space before the colon lines.Add("description: this is the first description"); lines.Add("description:this is the second description"); //no space after colon lines.Add("this is a line with no label"); foreach (var line in lines) { Console.WriteLine(StringHelpers.GetLabelFromLine(line)); Console.WriteLine(StringHelpers.StripLabelFromLine(line)); Console.WriteLine("--"); //note //this is a note //-- //test //just a test //-- //test //just a test //-- //firstName //Jim //-- // //She said this to him: follow me. //-- //description //this is the first description //-- //description //this is the first description //-- // //this is a line with no label //-- } Console.ReadLine(); } } public static class StringHelpers { public static string GetLabelFromLine(this string line) { string label = line.GetMatch(@"^?:(\s)"); //??????????????? if (!label.IsNullOrEmpty()) return label; else return ""; } public static string StripLabelFromLine(this string line) { return ...//??????????????? } public static bool IsNullOrEmpty(this string line) { return String.IsNullOrEmpty(line); } } public static class RegexHelpers { public static string GetMatch(this string text, string regex) { Match match = Regex.Match(text, regex); if (match.Success) { string theMatch = match.Groups[0].Value; return theMatch; } else { return null; } } } }

    Read the article

  • Regex gurus! here's a teaser: mixed thousands separators and csv's

    - by chichilatte
    I've got a string like... "labour 18909, liberals 12,365,conservatives 14,720" ...and i'd like a regex which can get rid of any thousands separators so i can pull out the numbers easily. Or even a regex which could give me a tidy array like: (labour => 18909, liberals => 12365, conservatives => 14720) Oh i wish i had the time to figure out regexes! Maybe i'll buy one as a toilet book, mmm.

    Read the article

  • In C, how do you capture a group with regex?

    - by Sylvain
    Hi, I'm trying to extract a string from another using regex. I'm using the POSIX regex functions (regcomp, regexec ...), and I fail at capturing a group ... For instance, let the pattern be something as simple as "MAIL FROM:<(.*)>" (with REG_EXTENDED cflags) I want to capture everything between '<' and '' My problem is that regmatch_t gives me the boundaries of the whole pattern (MAIL FROM:<...) instead of just what's between the parenthesis ... What am I missing ? Thanks in advance,

    Read the article

  • Using Scanner in Java how can I hasNext(aString) where the string is not regex pattern?

    - by Parris
    Hi, I am trying to do as my question states, sooo I have the following code which would find the match. String test = scan.next(); if (test.equals("$let")) return 1; However, I would prefer to use hasNext as to not consume a token; however, when i do the following it fails. if (scan.hasNext("$let")) return 1; I realize the when giving has next a variable it expects a pattern, but I thought if i don't have any regex symbols it should work. I also thought $ was possibly some regex symbol so I tried /$ however, that did not work! Thanks for the help!

    Read the article

  • Javascript form validation/sanitizing do i need regex here ?

    - by user318144
    I have a single form input that is for checking domains. Sometimes people type in www. before the domain or .com after the domain name. The service that i use to check availability automatically checks for all top level domains so when people add the .com at the end it becomes redundant. For example the string submitted is domainname.com.com which is clearly invalid. I understand you can do this on the server side but due to some rather weird circumstance i must use javascript for this. So is regex the solution here ? If so is there some kind of regex generator i can use for this or can someone point me in the right direction with a code snippet perhaps ? Appreciate any help thanks!

    Read the article

  • How to include named capture groups in java regex?

    - by jrummell
    I'm new to regex in Java and I can't figure out how to include named capture groups in an expression. I'm writing a ScrewTurn Image Converter for Confluence's Universal Wiki Converter. This is what I have: String image = "\\[image(?<align>auto)?\\|\\|{UP\\(((?<namespace>\\w+)\\.)?(?<pagename>[\\w-]+)\\)}(?<filename>[\\w- ]+\\.[\\w]+)\\]"; Pattern imagePattern = Pattern.compile(image, Pattern.CASE_INSENSITIVE); It's throwing this exception in Pattern.comiple(): java.util.regex.PatternSyntaxException: Unknown look-behind group near index 19 \[image(?<align>auto)?\|\|{UP\(((?<namespace>\w+)\.)?(?<pagename>[\w-]+)\)}(?<filename>[\w- ]+\.[\w]+)\] ^ I've used named capture groups like this before in C# (?<namedgroup>asdf), but not in Java. What am I missing?

    Read the article

  • How can I fix this regex to allow a specific string?

    - by Sailing Judo
    This regex comes from Atwood and is used to filter out anchor tags with anything other than the href and a title: <a\shref="(\#\d+|(https?|ftp)://[-A-Za-z0-9+&@#/%?=~_|!:,.;]+)"(\stitle="[^"]+")?\s?> I need to allow am additional attribute that specifically matches: target="_blank". So the following url should be allowed: <a href="http://www.google.com" target="_blank"> I tried changing the pattern to these: <a\shref="(\#\d+|(https?|ftp)://[-A-Za-z0-9+&@#/%?=~_|!:,.;]+)"(\stitle="[^"]+")(\starget="_blank")?\s?> <a\shref="(\#\d+|(https?|ftp)://[-A-Za-z0-9+&@#/%?=~_|!:,.;]+)"(\stitle="[^"]+")(\starget=\"_blank\")?\s?> Clearly I don't know regex very well. How should the pattern be adjusted to allow the blank target and no other targets?

    Read the article

< Previous Page | 44 45 46 47 48 49 50 51 52 53 54 55  | Next Page >