Search Results

Search found 4539 results on 182 pages for 'regex grouping'.

Page 94/182 | < Previous Page | 90 91 92 93 94 95 96 97 98 99 100 101  | Next Page >

  • How can I use Perl regular expressions to parse XML data?

    - by Luke
    I have a pretty long piece of XML that I want to parse. I want to remove everything except for the subclass-code and city. So that I am left with something like the example below. EXAMPLE TEST SUBCLASS|MIAMI CODE <?xml version="1.0" standalone="no"?> <web-export> <run-date>06/01/2010 <pub-code>TEST <ad-type>TEST <cat-code>Real Estate</cat-code> <class-code>TEST</class-code> <subclass-code>TEST SUBCLASS</subclass-code> <placement-description></placement-description> <position-description>Town House</position-description> <subclass3-code></subclass3-code> <subclass4-code></subclass4-code> <ad-number>0000284708-01</ad-number> <start-date>05/28/2010</start-date> <end-date>06/09/2010</end-date> <line-count>6</line-count> <run-count>13</run-count> <customer-type>Private Party</customer-type> <account-number>100099237</account-number> <account-name>DOE, JOHN</account-name> <addr-1>207 CLARENCE STREET</addr-1> <addr-2> </addr-2> <city>MIAMI</city> <state>FL</state> <postal-code>02910</postal-code> <country>USA</country> <phone-number>4014612880</phone-number> <fax-number></fax-number> <url-addr> </url-addr> <email-addr>[email protected]</email-addr> <pay-flag>N</pay-flag> <ad-description>DEANESTATES2BEDS2BATHSAPPLIANCED</ad-description> <order-source>Import</order-source> <order-status>Live</order-status> <payor-acct>100099237</payor-acct> <agency-flag>N</agency-flag> <rate-note></rate-note> <ad-content> MIAMI&#47;Dean Estates&#58; 2 beds&#44; 2 baths&#46; Applianced&#46; Central air&#46; Carpets&#46; Laundry&#46; 2 decks&#46; Pool&#46; Parking&#46; Close to everything&#46;No smoking&#46; No utilities&#46; &#36;1275 mo&#46; 401&#45;578&#45;1501&#46; </ad-content> </ad-type> </pub-code> </run-date> </web-export> PERL So what I want to do is open an existing file read the contents then use regular expressions to eliminate the unnecessary XML tags. open(READFILE, "FILENAME"); while(<READFILE>) { $_ =~ s/<\?xml version="(.*)" standalone="(.*)"\?>\n.*//g; $_ =~ s/<subclass-code>//g; $_ =~ s/<\/subclass-code>\n.*/|/g; $_ =~ s/(.*)PJ RER Houses /PJ RER Houses/g; $_ =~ s/\G //g; $_ =~ s/<city>//g; $_ =~ s/<\/city>\n.*//g; $_ =~ s/<(\/?)web-export>(.*)\n.*//g; $_ =~ s/<(\/?)run-date>(.*)\n.*//g; $_ =~ s/<(\/?)pub-code>(.*)\n.*//g; $_ =~ s/<(\/?)ad-type>(.*)\n.*//g; $_ =~ s/<(\/?)cat-code>(.*)<(\/?)cat-code>\n.*//g; $_ =~ s/<(\/?)class-code>(.*)<(\/?)class-code>\n.*//g; $_ =~ s/<(\/?)placement-description>(.*)<(\/?)placement-description>\n.*//g; $_ =~ s/<(\/?)position-description>(.*)<(\/?)position-description>\n.*//g; $_ =~ s/<(\/?)subclass3-code>(.*)<(\/?)subclass3-code>\n.*//g; $_ =~ s/<(\/?)subclass4-code>(.*)<(\/?)subclass4-code>\n.*//g; $_ =~ s/<(\/?)ad-number>(.*)<(\/?)ad-number>\n.*//g; $_ =~ s/<(\/?)start-date>(.*)<(\/?)start-date>\n.*//g; $_ =~ s/<(\/?)end-date>(.*)<(\/?)end-date>\n.*//g; $_ =~ s/<(\/?)line-count>(.*)<(\/?)line-count>\n.*//g; $_ =~ s/<(\/?)run-count>(.*)<(\/?)run-count>\n.*//g; $_ =~ s/<(\/?)customer-type>(.*)<(\/?)customer-type>\n.*//g; $_ =~ s/<(\/?)account-number>(.*)<(\/?)account-number>\n.*//g; $_ =~ s/<(\/?)account-name>(.*)<(\/?)account-name>\n.*//g; $_ =~ s/<(\/?)addr-1>(.*)<(\/?)addr-1>\n.*//g; $_ =~ s/<(\/?)addr-2>(.*)<(\/?)addr-2>\n.*//g; $_ =~ s/<(\/?)state>(.*)<(\/?)state>\n.*//g; $_ =~ s/<(\/?)postal-code>(.*)<(\/?)postal-code>\n.*//g; $_ =~ s/<(\/?)country>(.*)<(\/?)country>\n.*//g; $_ =~ s/<(\/?)phone-number>(.*)<(\/?)phone-number>\n.*//g; $_ =~ s/<(\/?)fax-number>(.*)<(\/?)fax-number>\n.*//g; $_ =~ s/<(\/?)url-addr>(.*)<(\/?)url-addr>\n.*//g; $_ =~ s/<(\/?)email-addr>(.*)<(\/?)email-addr>\n.*//g; $_ =~ s/<(\/?)pay-flag>(.*)<(\/?)pay-flag>\n.*//g; $_ =~ s/<(\/?)ad-description>(.*)<(\/?)ad-description>\n.*//g; $_ =~ s/<(\/?)order-source>(.*)<(\/?)order-source>\n.*//g; $_ =~ s/<(\/?)order-status>(.*)<(\/?)order-status>\n.*//g; $_ =~ s/<(\/?)payor-acct>(.*)<(\/?)payor-acct>\n.*//g; $_ =~ s/<(\/?)agency-flag>(.*)<(\/?)agency-flag>\n.*//g; $_ =~ s/<(\/?)rate-note>(.*)<(\/?)rate-note>\n.*//g; $_ =~ s/<ad-content>(.*)\n.*//g; $_ =~ s/\t(.*)\n.*//g; $_ =~ s/<\/ad-content>(.*)\n.*//g; } close( READFILE1 ); Is there an easier way of doing this? I don't want to use any modules. I know that it might make this easier but the file I am reading has a lot of data in it.

    Read the article

  • UPDATE REGEX MYSQL

    - by Simon
    I have a table of contacts and a table of postcode data. I need to match the first part of the postcode and the join that with the postcode table... and then perform an update... I want to do something like this... UPDATE `contacts` LEFT JOIN `postcodes` ON PREG_GREP("/^[A-Z]{1,2}[0-9][0-9A-Z]{0,1}/", `contacts`.`postcode`) = `postcodes`.`postcode` SET `contacts`.`lat` = `postcode`.`lat`, `contacts`.`lng` = `postcode`.`lng` Is it possible?? Or do I need to use an external script? Many thanks.

    Read the article

  • Problem with Javascript RegExp-mask

    - by OrjanL
    I have a string that looks something like this: {theField} > YEAR (today, -3) || {theField} < YEAR (today, +3) I want it to be replaced into: {theField} > " + YEAR (today, -3) + " || {theField} < " + YEAR (today, +3) + " I have tried this: String.replace(/(.*)(YEAR|MONTH|WEEK|DAY+)(.*[)]+)/g, "$1 \" + $2 $3 + \"") But that gives me: {theField} > YEAR (today, +3) || {theField} > " + YEAR (today, +3) + " Does anyone have any ideas?

    Read the article

  • Objective C - RegexKitLite - Parsing inner contents of a string, ie: start(.*?)end

    - by Stu
    Please consider the following: NSString *myText = @"mary had a little lamb"; NSString *regexString = @"mary(.*?)little"; for)NSString *match in [myText captureComponentsMatchedByRegex:regexString]){ NSLog(@"%@",match); } This will output to the console two things: 1) "mary had a little" 2) "had a" What I want is just the 2nd bit of information "had a". Is there is a way of matching a string and returning just the inner part? I'm fairly new to Objective C, this feels a rather trivial question yet I can't find a less messy way of doing this than incrementing an integer in the for loop and on the second iteration storing the "had a" in an NSString.

    Read the article

  • Convert regular expression to CFG

    - by user242581
    How can I convert some regular language to its equivalent Context Free Grammar(CFG)? Whether the DFA corresponding to that regular expression is required to be constructed or is there some rule for the above conversion? For example, considering the following regular expression 01+10(11)* How can I describe the grammar corresponding to the above RE?

    Read the article

  • how to match a regulas expresion like (%i1) in python pexpect

    - by mike
    I want to use maxima from python using pexpect, whenever maxima starts it will print a bunch of stuff of this form: $ maxima Maxima 5.27.0 http://maxima.sourceforge.net using Lisp SBCL 1.0.57-1.fc17 Distributed under the GNU Public License. See the file COPYING. Dedicated to the memory of William Schelter. The function bug_report() provides bug reporting information. (%i1) i would like to start up pexpect like so: import pexpect cmd = 'maxima' child = pexpect.spawn(cmd) child.expect (' match all that stuff up to and including (%i1)') child.sendline ('integrate(sin(x),x)') chil.expect( match (%o1 ) ) print child.before how do i match the starting banner up to the prompt (%i1)? and so on, also maxima increments the (%i1)'s by one as the session goes along, so the next expect would be: child.expect ('match (%i2)') child.sendline ('integrate(sin(x),x)') chil.expect( match (%o2 ) ) print child.before how do i match the (incrementing) integers?

    Read the article

  • Intersection of two regular expressions

    - by Henry
    Hi, Im looking for function (PHP will be the best), which returns true whether exists string matches both regexpA and regexpB. Example 1: $regexpA = '[0-9]+'; $regexpB = '[0-9]{2,3}'; hasRegularsIntersection($regexpA,$regexpB) returns TRUE because '12' matches both regexps Example 2: $regexpA = '[0-9]+'; $regexpB = '[a-z]+'; hasRegularsIntersection($regexpA,$regexpB) returns FALSE because numbers never matches literals. Thanks for any suggestions how to solve this. Henry

    Read the article

  • In Python BeautifulSoup How to move tags

    - by JJ
    I have a partially converted XML document in soup coming from HTML. After some replacement and editing in the soup, the body is essentially - <Text...></Text> # This replaces <a href..> tags but automatically creates the </Text> <p class=norm ...</p> <p class=norm ...</p> <Text...></Text> <p class=norm ...</p> and so forth. I need to "move" the <p> tags to be children to <Text> or know how to suppress the </Text>. I want - <Text...> <p class=norm ...</p> <p class=norm ...</p> </Text> <Text...> <p class=norm ...</p> </Text> I've tried using item.insert and item.append but I'm thinking there must be a more elegant solution. for item in soup.findAll(['p','span']): if item.name == 'span' and item.has_key('class') and item['class'] == 'section': xBCV = short_2_long(item._getAttrMap().get('value','')) if currentnode: pass currentnode = Tag(soup,'Text', attrs=[('TypeOf', 'Section'),... ]) item.replaceWith(currentnode) # works but creates end tag elif item.name == 'p' and item.has_key('class') and item['class'] == 'norm': childcdatanode = None for ahref in item.findAll('a'): if childcdatanode: pass newlink = filter_hrefs(str(ahref)) childcdatanode = Tag(soup, newlink) ahref.replaceWith(childcdatanode) Thanks

    Read the article

  • Building a Hashtag in Javascript without matching Anchor Names, BBCode or Escaped Characters

    - by Martindale
    I would like to convert any instances of a hashtag in a String into a linked URL: #hashtag - should have "#hashtag" linked. This is a #hashtag - should have "#hashtag" linked. This is a [url=http://www.mysite.com/#name]named anchor[/url] - should not be linked. This isn&#39;t a pretty way to use quotes - should not be linked. Here is my current code: String.prototype.parseHashtag = function() { return this.replace(/[^&][#]+[A-Za-z0-9-_]+(?!])/, function(t) { var tag = t.replace("#","") return t.link("http://www.mysite.com/tag/"+tag); }); }; Currently, this appears to fix escaped characters (by excluding matches with the amperstand), handles named anchors, but it doesn't link the #hashtag if it's the first thing in the message, and it seems to grab include the 1-2 characters prior to the "#" in the link. Halp!

    Read the article

  • strange behavior in vim with negative look-behind

    - by João Portela
    So, I am doing this search in vim: /\(\(unum\)\|\(player\)=\)\@<!\"1\" and as expected it does not match lines that have: player="1" but matches lines that have: unum="1" what am i doing wrong? isn't the atom to be negated all of this: \(\(unum\)\|\(player\)=\) naturally just doing: /\(\(unum\)\|\(player\)=\) matches unum= or player=.

    Read the article

  • Combine regular expressions for splitting camelCase string into words

    - by stou
    I managed to implement a function that converts camel case to words, by using the solution suggested by @ridgerunner in this question: Split camelCase word into words with php preg_match (Regular Expression) However, I want to also handle embedded abreviations like this: 'hasABREVIATIONEmbedded' translates to 'Has ABREVIATION Embedded' I came up with this solution: <?php function camelCaseToWords($camelCaseStr) { // Convert: "TestASAPTestMore" to "TestASAP TestMore" $abreviationsPattern = '/' . // Match position between UPPERCASE "words" '(?<=[A-Z])' . // Position is after group of uppercase, '(?=[A-Z][a-z])' . // and before group of lowercase letters, except the last upper case letter in the group. '/x'; $arr = preg_split($abreviationsPattern, $camelCaseStr); $str = implode(' ', $arr); // Convert "TestASAP TestMore" to "Test ASAP Test More" $camelCasePattern = '/' . // Match position between camelCase "words". '(?<=[a-z])' . // Position is after a lowercase, '(?=[A-Z])' . // and before an uppercase letter. '/x'; $arr = preg_split($camelCasePattern, $str); $str = implode(' ', $arr); $str = ucfirst(trim($str)); return $str; } $inputs = array( 'oneTwoThreeFour', 'StartsWithCap', 'hasConsecutiveCAPS', 'ALLCAPS', 'ALL_CAPS_AND_UNDERSCORES', 'hasABREVIATIONEmbedded', ); echo "INPUT"; foreach($inputs as $val) { echo "'" . $val . "' translates to '" . camelCaseToWords($val). "'\n"; } The output is: INPUT'oneTwoThreeFour' translates to 'One Two Three Four' 'StartsWithCap' translates to 'Starts With Cap' 'hasConsecutiveCAPS' translates to 'Has Consecutive CAPS' 'ALLCAPS' translates to 'ALLCAPS' 'ALL_CAPS_AND_UNDERSCORES' translates to 'ALL_CAPS_AND_UNDERSCORES' 'hasABREVIATIONEmbedded' translates to 'Has ABREVIATION Embedded' It works as intended. My question is: Can I combine the 2 regular expressions $abreviationsPattern and camelCasePattern so i can avoid running the preg_split() function twice?

    Read the article

  • Regexp that matches user-agents of end-user browsers but NOT crawlers with >90 % accuracy

    - by knorv
    I'm trying to construct a regexp that will evaluate to true for User-Agent:s of "browsers navigated by humans", but false for bots. Needless to say the matching will not be exact, but if it gets things right in say 90 % of cases that is more than good enough. My approach so far is to target the User-Agent string of the the five major desktop browsers (MSIE, Firefox, Chrome, Safari, Opera). Specifically I want the regexp NOT to match if the user-agent is a bot (Googlebot, msnbot, etc.). Currently I'm using the following regexp which appears to achieve the desired precision: ^(Mozilla.*(Gecko|KHTML|MSIE|Presto|Trident)|Opera).*$ I've observed small number of false negatives which are mostly mobile browsers. The exceptions all match: (BlackBerry|HTC|LG|MOT|Nokia|NOKIAN|PLAYSTATION|PSP|SAMSUNG|SonyEricsson) My question is: Given the desired accuracy level, how would you improve the regexp? Can you think of any major false positives or false negatives to the given regexp? Please note that the question is specifically about regexp-based User-Agent matching. There are a bunch of other approaches to solving this problem, but those are out of the scope of this question.

    Read the article

  • Not-quite-JSON string deserialization in Python

    - by cpharmston
    I get the following text as a string from an XML-based REST API 'd':4 'ca':5 'sen':1 'diann':2,6,8 'feinstein':3,7,9 that I'm looking to deserialize into a pretty little Python dictionary: { 'd': [4], 'ca': [5], 'sen': [1], 'diann': [2, 6, 8], 'feinstein': [3, 7, 9] } I'm hoping to avoid using regular expressions or heavy string manipulation, as this format isn't documented and may change. The best I've been able to come up with: members = {} for m in elem.text.split(' '): m = m.split(':') members[m[0].replace("'", '')] = map(int, m[1].split(',')) return members Obviously a terrible approach, but it works, and that's better than anything else I've got right now. Any suggestions on better approaches?

    Read the article

  • Regular expression for email

    - by Nadeem
    I tried the reg expression ^([a-zA-Z0-9_.-])+@([a-zA-Z0-9_.-])+\.([a-zA-Z])+([a-zA-Z])+ for the email validation. Since I want the user to allow submitting even with the empty email address. So I changed the reg ex to (^([a-zA-Z0-9_.-])+@([a-zA-Z0-9_.-])+\.([a-zA-Z])+([a-zA-Z])+)? But this expression accepts any email address without any validation.

    Read the article

  • Regexp: Replace only in specific context

    - by blinry
    In a text, I would like to replace all occurrences of $word by [$word]($word) (to create a link in Markdown), but only if it is not already in a link. Example: [$word homepage](http://w00tw00t.org) should not become [[$word]($word) homepage](http://w00tw00t.org). Thus, I need to check whether $word is somewhere between [ and ] and only replace if it's not the case. Can you think of a preg_replace command for this?

    Read the article

  • stripping a query string with php (preg_replace)

    - by pg
    http://www.chuckecheese.com/rotator.php?cheese=4&id=1 I want to take out the id, leaving the cheese to stand alone. I tried: $qs = preg_replace("[^&id=*]" ,'',$_SERVER[QUERY_STRING]); But that said I was using an improper modifier. I want to remove "$id=" and whatever number comes after it. Are regexp really as hard as they seem for me?

    Read the article

  • Transforming a string to a valid PDO_MYSQL DSN

    - by Alix Axel
    What is the most concise way to transform a string in the following format: mysql:[/[/]][user[:pass]@]host[:port]/db[/] Into a usuable PDO connection/instance (using the PDO_MYSQL DSN), some possible examples: $conn = new PDO('mysql:host=host;dbname=db'); $conn = new PDO('mysql:host=host;port=3307;dbname=db'); $conn = new PDO('mysql:host=host;port=3307;dbname=db', 'user'); $conn = new PDO('mysql:host=host;port=3307;dbname=db', 'user', 'pass'); I've been trying some regular expressions (preg_[match|split|replace]) but they either don't work or are too complex, my gut tells me this is not the way to go but nothing else comes to my mind. Any suggestions?

    Read the article

< Previous Page | 90 91 92 93 94 95 96 97 98 99 100 101  | Next Page >