Search Results

Search found 10005 results on 401 pages for 'regex trouble'.

Page 120/401 | < Previous Page | 116 117 118 119 120 121 122 123 124 125 126 127  | Next Page >

  • python: multiline regular expression

    - by facha
    Hi, everyone I have a piece of text and I've got to parse usernames and hashes out of it. Right now I'm doing it with two regular expressions. Could I do it with just one multiline regular expression? #!/usr/bin/env python import re test_str = """ Hello, UserName. Please read this looooooooooooooooong text. hash Now, write down this hash: fdaf9399jef9qw0j. Then keep reading this loooooooooong text. Hello, UserName2. Please read this looooooooooooooooong text. hash Now, write down this hash: gtwnhton340gjr2g. Then keep reading this loooooooooong text. """ logins = re.findall('Hello, (?P<login>.+).',test_str) hashes = re.findall('hash: (?P<hash>.+).',test_str)

    Read the article

  • How to avoid resetting the java Scanner position

    - by Derek
    I have some code that looks more or less like this: while(scanner.hasNext()) { if(scanner.findInLine("Test") !=null) { //do some things }else{ scanner.nextLine(); } } I am using this to parse an ~10MB text file. The problem is, if I put a breakpoint on the while() and the scanner.nextLine(), I can see that sometimes the scanners position (in the debug window) goes back to zero. I think this is causing me some kind of loop blow up, because the regext in findInLine() starts at zero, looks through some amount of text, advancing the position, and then it randomly gets set back to zero, so it has to re-parse all that text again. Any ideas what can be causing that? Am I even doing this the right way? Thanks Some additional info: The Scanner is instantiated from an InputStream. After diubg sine debugging, it appears that there is a HearCharBuffer that Scanner uses and it only allows 1024 characters at a time, and then resets. Is there a way to avoid this, or do things differently? That seems like a small amount of characters to be able to scan. Derek

    Read the article

  • Extract a regular expression match in R version 2.10

    - by tovare
    Hi, I'm trying to extract a number from a string. And do something like this [0-9]+ on this string "aaaa12xxxx" and get "12". I thought it would be something like: > grep("[0-9]+","aaa12xxx", value=TRUE) [1] "aaa12xxx" And then I figured... > sub("[0-9]+", "\\1", "aaa12xxxx") [1] "aaa12xxx" But I got some form of response doing: > sub("[0-9]+", "ARGH!", "aaa12xxxx") [1] "aaaARGH!xxx" There's a small detail I'm missing Please advice :-) I'm using R version 2.10.1 (2009-12-14) Thanks ! Comments on the solution The best solution is to ignore the standard functions and install Hadley Wickham's stringr package to get something that actually makes sense. Kudos to Marek for figuring out how the standard library worked.

    Read the article

  • A "smart" (forgiving) date parser?

    - by jdmuys
    I have to migrate a very large dataset from one system to another. One of the "source" column contains a date but is really a string with no constraint, while the destination system mandates a date in the format yyyy-mm-dd. Many, but not all, of the source dates are formatted as yyyymmdd. So to coerce them to the expected format, I do (in Perl): return "$1-$2-$3" if ($val =~ /(\d{4})[-\/]*(\d{2})[-\/]*(\d{2})/); The problem arises when the source dates moves away from the "generic" yyyymmdd. The goal is to salvage as many dates as possible, before giving up. Example source strings include: 21/3/1998, March 2004, 2001, 3/4/97 I can try to match as many of the examples I can find with a succession of regular expressions such as the one above. But is there something smarter to do? Am I not reinventing the wheel? Is there a library somewhere doing something similar? I couldn't find anything relevant googling "forgiving date parser". (any language is OK).

    Read the article

  • return empty string from preg_split

    - by Gutzofter
    Right now i'm trying to get this: Array ( [0] => hello [1] => [2] => goodbye ) Where index 1 is the empty string. $toBeSplit= 'hello,,goodbye'; $textSplitted = preg_split('/[,]+/', $toBeSplit, -1); $textSplitted looks like this: Array ( [0] => hello [1] => goodbye ) I'm using PHP 5.3.2

    Read the article

  • Using `rack-rewrite` to Remove the Month and Date from a Permlink

    - by Bryan Veloso
    I've started the process of moving my blog to Octopress, but unfortunately, a limitation of Jekyll doesn't allow me to use abbreviated month names for my permalinks. Therefore I'm looking to just get rid of the month and day bits altogether. I'ved read in this article that you can use rack-rewrite to take care of the redirection, since I am using Heroku to host this. So how would I turn: This: example.com/journal/2012/jan/03/post-of-the-day/ Into this: example.com/journal/2012/post-of-the-day/ Extra points: If I had another rule that redirected /blog/ to /journal/, would that rule still adhere to the above one as well? So from: This: example.com/blog/2012/jan/03/post-of-the-day/ To this: example.com/journal/2012/jan/03/post-of-the-day/ And finally to: example.com/journal/2012/post-of-the-day/ Thanks for the assistance in advance. :)

    Read the article

  • Modify a reference numbered group to match against

    - by StuperUser
    I want to match YYYY-YY for sequential years. I at moment I'm trying to match where all the second YY is the 3rd and 4th characters in YYYY with 1 added to it. So far I've got {19|20}(\d{2})-(\d{2}), but not sure how to use ? with reference to (1) or whether I'm going about this the right way and finding out the inevitable "unknown unknowns" (like YY99) with this approach? Edit: Matches: 2010-11,2011-12,2029-30 Not matches: 2010-12, 2010-09,2011-2,2011-2012

    Read the article

  • Regular expression to match HTML table row ( <tr> ) NOT containing a specific value

    - by user1821136
    I'm using Notepad++ to clean up a long and messy HTML table. I'm trying to use regular expressions even if I'm a total noob. :) I need to remove all the table rows that doesn't contain a specific value (may I call that substring?). After having all the file contents unwrapped, I've been able to use the following regular expression to select, one by one, every table row with all its contents: <tr>.+?</tr> How can I improve the regular expression in order to select and replace only table rows containing, somewhere inside a part of them, that defined substring? I don't know if this does matter but the structure of every table row is the following (I've put there every HTML tag, the dots stand for standard content/values) <tr> <td> ... </td> <td> ... </td> <td> <a sfref="..." href="...">!! SUBSTRING I HAVE TO MATCH HERE !!</a> </td> <td> <img /> </td> <td> ... </td> <td> ... </td> <td> ... </td> <td> ... </td> </tr> Thanks in advance for your help!

    Read the article

  • Is it a solvable problem to generate a regular expression that matches some input set?

    - by Roman
    I provide some input set which contains known separated number of text blocks. I want to make a program that automatically generate 1 or more regular expressions each of which matches every text block in the input set. I see some relatively easy ways to implement a brute-force search. But I'm not an expert in compilers theory. That's why I'm curious: 1) is this problem solvable? or there are some principle impossibility to make such algorithm? 2) is it possible to achieve polynomial complexity for this algorithm and avoid brute forcing?

    Read the article

  • Delete all characters in a multline string upto a given pattern

    - by biffabacon
    Using Python I need to delete all charaters in a multiline string up to the first occurrence of a given pattern. In Perl this can be done using regular expressions with something like: #remove all chars up to first occurrence of cat or dog or rat $pattern = 'cat|dog|rat' $pagetext =~ s/(.*)($pattern)/$2/xms; What's the best way to do it in Python?

    Read the article

  • How do I create a regular expression to match a word misspelling the original case sensitivity?

    - by Patrick Allaert
    I want to discover wrong spelling of "FooBar" in sentence: "This is a 'FooBar' example where I should match different spelling of fooBar such as: foobar, FOOBAR or even fOoBaR but not foobarS!" In this sentence, I would like to match words (in order): fooBar, foobar, FOOBAR, fOoBaR and not: FooBar (correct spelling), foobarS (not the same word) Is there an existing solution using Perl Regular Expression? This is intended to be used with grep -P Thanks

    Read the article

  • what is the return value of BeautifulSoup.find ?

    - by prosseek
    I run to get some value as score. score = soup.find('div', attrs={'class' : 'summarycount'}) I run 'print score' to get as follows. <div class=\"summarycount\">524</div> I need to extract the number part. I used re module but failed. m = re.search("[^\d]+(\d+)", score) TypeError: expected string or buffer function search in re.py at line 142 return _compile(pattern, flags).search(string) What's the return type of the find function? How to get the number from the score variable? Is there any easy way to let BeautifulSoup to return the value(in this case 524) itself?

    Read the article

  • Remove duplicate characters using a regular expression

    - by Alex
    I need to Match the second and subsequent occurances of the * character using a regular expression. I'm actually using the Replace method to remove them so here's some examples of before and after: test* -> test* (no change) *test* -> *test test** *e -> test* e Is it possible to do this with a regular expression? Thanks

    Read the article

  • Matching First Alphanumeric Character skipping (The |An? )

    - by TheLizardKing
    I have a list of artists, albums and tracks that I want to sort using the first letter of their respective name. The issue arrives when I want to ignore "The ", "A ", "An " and other various non-alphanumeric characters (Talking to you "Weird Al" Yankovic and [dialog]). Django has a nice start '^(An?|The) +' but I want to ignore those and a few others of my choice. I am doing this in Django, using a MySQL db with utf8_bin collation. EDIT Well my fault for not mentioning this but the database I am accessing is pretty much ready only. It's created and maintained by Amarok and I can't alter it without a whole mess of issues. That being said the artist table has The Chemical Brothers listed as The Chemical Brothers so I think I am stuck here. It probably will be slow but that's not so much of a concern for me as it's a personal project.

    Read the article

< Previous Page | 116 117 118 119 120 121 122 123 124 125 126 127  | Next Page >