Search Results

Search found 9016 results on 361 pages for 'regex libraries'.

Page 129/361 | < Previous Page | 125 126 127 128 129 130 131 132 133 134 135 136  | Next Page >

  • Parsing HTML with XPath and PHP

    - by Peter
    Is there a way (using XPath and PHP) to do the following (WITHOUT external XSLT files)? Remove all tables and their contents Remove everything after the first h1 tag Keep only paragraphs (INCLUDING their inner HTML (links, lists, etc)) I received an XSLT answer here, but I'm looking for XPATH queries that don't require external files. Currently, I've got the HTML in question loaded into a SimpleXmlElement via: $doc = @DOMDocument::loadHTML($xml); $data = simplexml_import_dom($doc); Now I need help with: $data = $data->xpath('??????'); Been working with this one for several days to no avail. I really appreciate the help. Edit: I don't particularly care what's inside the paragraphs, as I can use strip_tags to eliminate what I don't want. All I need to do is to isolate the paragraphs from the rest of the source. I suppose a more specific, accurate requirement would be this: Return only paragraphs (and their html contents) that aren't contained in tables, and only before the first h1 tag

    Read the article

  • How to process this string via regular expression

    - by iiduce
    my string style like this: expression1/field1+expression2*expression3+expression4/field2*expression5*expression6/field3 a real style mybe like this: computer/(100)+web*mail+explorer/(200)*bbs*solution/(300) "+" and "*" represent operator "computer","web"...represent expression (100),(200) represent field num . field num may not exist. I want process the string to this: /(100)+web*+explorer/(200)bbs/(300) rules like this: if expression length is more than 3 and its field is not (200), then add brackets to it.

    Read the article

  • How to find which delimiter was used during string split (VB.NET)

    - by typoknig
    Hi all, lets say I have a string that I want to split based on several characters, like ".", "!", and "?". How do I figure out which one of those characters split my string so I can add that same character back on to the end of the split segments in question? Dim linePunctuation as Integer = 0 Dim myString As String = "some text. with punctuation! in it?" For i = 1 To Len(myString) If Mid$(entireFile, i, 1) = "." Then linePunctuation += 1 Next For i = 1 To Len(myString) If Mid$(entireFile, i, 1) = "!" Then linePunctuation += 1 Next For i = 1 To Len(myString) If Mid$(entireFile, i, 1) = "?" Then linePunctuation += 1 Next Dim delimiters(3) As Char delimiters(0) = "." delimiters(1) = "!" delimiters(2) = "?" currentLineSplit = myString.Split(delimiters) Dim sentenceArray(linePunctuation) As String Dim count As Integer = 0 While linePunctuation > 0 sentenceArray(count) = currentLineSplit(count)'Here I want to add what ever delimiter was used to make the split back onto the string before it is stored in the array.' count += 1 linePunctuation -= 1 End While

    Read the article

  • How to detect identical part(s) inside string?

    - by Horace Ho
    I try to break down the http://stackoverflow.com/questions/2711961/decoding-algorithm-wanted question into smaller questions. This is Part I. Question: two strings: s1 and s2 part of s1 is identical to part of s2 space is separator how to extract the identical part(s)? example 1: s1 = "12 November 2010 - 1 visitor" s2 = "6 July 2010 - 100 visitors" the identical parts are "2010", "-", "1" and "visitor" example 2: s1 = "Welcome, John!" s2 = "Welcome, Peter!" the identical parts are "Welcome," and "!" Python and Ruby preferred. Thanks

    Read the article

  • RegExp to validate a formula (string/boolean/numeric expression)?

    - by JSteve
    I have used regExp quit a bit of times but still far from being an expert. This time I want to validate a formula (or math expression) by regExp. The difficult part here is to validate proper starting and ending parentheses with in the formula. I believe, there would be some sample on the web but I could not find it. Can somebody post a link for such example? or help me by some other means?

    Read the article

  • Are .NET's regular expressions Turing complete?

    - by Robert
    Regular expressions are often pointed to as the classical example of a language that is not Turning complete. For example "regular expressions" is given in as the answer to this SO question looking for languages that are not Turing complete. In my, perhaps somewhat basic, understanding of the notion of Turning completeness, this means that regular expressions cannot be used check for patterns that are "balanced". Balanced meaning have an equal number of opening characters as closing characters. This is because to do this would require you to have some kind of state, to allow you to match the opening and closing characters. However the .NET implementation of regular expressions introduces the notion of a balanced group. This construct is designed to let you backtrack and see if a previous group was matched. This means that a .NET regular expressions: ^(?<p>a)*(?<-p>b)*(?(p)(?!))$ Could match a pattern that: ab aabb aaabbb aaaabbbb ... etc. ... Does this means .NET's regular expressions are Turing complete? Or are there other things that are missing that would be required for the language to be Turing complete?

    Read the article

  • Which is more efficient regular expression?

    - by Vagnerr
    I'm parsing some big log files and have some very simple string matches for example if(m/Some String Pattern/o){ #Do something } It seems simple enough but in fact most of the matches I have could be against the start of the line, but the match would be "longer" for example if(m/^Initial static string that matches Some String Pattern/o){ #Do something } Obviously this is a longer regular expression and so more work to match. However I can use the start of line anchor which would allow an expression to be discarded as a failed match sooner. It is my hunch that the latter would be more efficient. Can any one back me up/shoot me down :-)

    Read the article

  • [Python] OR in regular expression?

    - by www.yegorov-p.ru
    Hello. I have text file with several thousands lines. I want to parse this file into database and decided to write a regexp. Here's part of file: blablabla checked=12 unchecked=1 blablabla unchecked=13 blablabla checked=14 As a result, I would like to get something like (12,1) (0,13) (14,0) Is it possible?

    Read the article

  • regular expression and escaping

    - by pstanton
    Sorry if this has been asked, my search brought up many off topic posts. I'm trying to convert wildcards from a user defined search string (wildcard is "*") to postgresql like wildcard "%". I'd like to handle escaping so that "%" => "\%" and "\*" => "*" I know i could replace \* with something else prior to replacing * and then swap it back, but i'd prefer not to and instead only convert * using a pattern that selects it when not proceeded by \. String convertWildcard(String like) { like = like.replaceAll("%", "\\%"); like = like.replaceAll("\\*", "%"); return like; } Assert.assertEquals("%", convertWildcard("*")); Assert.assertEquals("\%", convertWildcard("%")); Assert.assertEquals("*", convertWildcard("\*")); // FAIL Assert.assertEquals("a%b", convertWildcard("a*b")); Assert.assertEquals("a\%b", convertWildcard("a%b")); Assert.assertEquals("a*b", convertWildcard("a\*b")); // FAIL ideas welcome.

    Read the article

  • Alter Regular Expression to Return 2 Values Instead of 3 from userAgent String

    - by Jay
    I've taken a regular expression from jQuery to detect if a browser's engine is WebKit and gets it's version number, it returns 3 values extracted from the userAgent string: webkit/….…, webkit and ….… [“….…” being the version number]. I would like the regular expression to return just 2 values: webkit and ….…. I'm rubbish at regular expressions, so please can you give an explanation of the expression with your answer. The regular expression I'm currently working with and wish to improve is: /(webkit)[\/]([\w.]+)/. I appreciate all your help, thanks in advance!

    Read the article

  • Using regular expressions

    - by Tom
    What is wrong with this regexp? I need it to make $name to be letter-number only. Now it doens't seem to work at all. if (!preg_match("/^[A-Za-z0-9]$/",$name)) { $e[]="name must contain only letters or numbers"; }

    Read the article

  • Optimizing python link matching regular expression

    - by Matt
    I have a regular expression, links = re.compile('<a(.+?)href=(?:"|\')?((?:https?://|/)[^\'"]+)(?:"|\')?(.*?)>(.+?)</a>',re.I).findall(data) to find links in some html, it is taking a long time on certain html, any optimization advice? One that it chokes on is http://freeyourmindonline.net/Blog/

    Read the article

  • Weird error using preg_match and unicode

    - by Thorpe Obazee
    if (preg_match('(\p{Nd}{4}/\p{Nd}{2}/\p{Nd}{2}/\p{L}+)', '2010/02/14/this-is-something')) { // do stuff } The above code works. However this one doesn't. if (preg_match('/\p{Nd}{4}/\p{Nd}{2}/\p{Nd}{2}/\p{L}+/u', '2010/02/14/this-is-something')) { // do stuff } Maybe someone could shed some light as to why the one below doesn't work. This is the error that is being produced: A PHP Error was encountered Severity: Warning Message: preg_match() [function.preg-match]: Unknown modifier '\'

    Read the article

  • String pattern matching in Javascript

    - by kwokwai
    Hi all, I am doing some self learning about Patern Matching in Javascript. I got a simple input text field in a HTML web page, and I have done some Javascript to capture the string and check if there are any strange characters other than numbers and characters in the string. But I am not sure if it is correct. Only numbers, characters or a mixture of numbers and characters are allowed. var pattern = /^[a-z]+|[A-Z]+|[0-9]+$/; And I have another question about Pattern Matching in Javascript, what does the percentage symbol mean in Pattern matching. For example: var pattern = '/[A-Z0-9._%-]+@[A-Z0-9.-]+\.[A-Z]{2,4}/';

    Read the article

  • Regular expressions in python unicode

    - by Remy
    I need to remove all the html tags from a given webpage data. I tried this using regular expressions: import urllib2 import re page = urllib2.urlopen("http://www.frugalrules.com") from bs4 import BeautifulSoup, NavigableString, Comment soup = BeautifulSoup(page) link = soup.find('link', type='application/rss+xml') print link['href'] rss = urllib2.urlopen(link['href']).read() souprss = BeautifulSoup(rss) description_tag = souprss.find_all('description') content_tag = souprss.find_all('content:encoded') print re.sub('<[^>]*>', '', content_tag) But the syntax of the re.sub is: re.sub(pattern, repl, string, count=0) So, I modified the code as (instead of the print statement above): for row in content_tag: print re.sub(ur"<[^>]*>",'',row,re.UNICODE But it gives the following error: Traceback (most recent call last): File "C:\beautifulsoup4-4.3.2\collocation.py", line 20, in <module> print re.sub(ur"<[^>]*>",'',row,re.UNICODE) File "C:\Python27\lib\re.py", line 151, in sub return _compile(pattern, flags).sub(repl, string, count) TypeError: expected string or buffer What am I doing wrong?

    Read the article

  • How can I match a match a null byte (0x00) in the Visual Studio binary editor with a find using a re

    - by Paul K
    Open a file in the Visual Studio binary editor that contains a null byte (0x00), then use the Quick Find feature (Ctrl +F) to find null bytes. I would have thought I could use a regular expression such as \x00 to match null bytes but it doesn't work. Searching for any other hex value using this method works fine. Is this a VS bug, 'feature', or am I just missing something? Is there a work around?

    Read the article

  • regexp for detect that the url doesn´t end with an extension

    - by devnieL
    Hello. I'm using this regular expression for detect if an url ends with a jpg : var exp = /(\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|]*^\.jpg)/ig; it detects the url : e.g. http://www.blabla.com/sdsd.jpg but now i want to detect that the url doesn't ends with an jpg extension, i try with this : var exp = /(\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|]*[^\.jpg]\b)/ig; but only get http://www.blabla.com/sdsd then i used this : var exp = /(\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|]*[^\.jpg]$)/ig; it works if the url is alone, but dont work if the text is e.g. : http://www.blabla.com/sdsd.jpg text

    Read the article

  • How can I execute a bunch of editor commands stored in a file in VIM?

    - by LES2
    I have read the other posts, e.g., http://stackoverflow.com/questions/1830886/vim-executing-a-list-of-editor-commands and others. The answer isn't clear to me for my case. I have some editor commands that I generated from an SQL query. It uses :s/foo/bar to change country codes (from FIPS to a non-standard code set). Here's a sample of the file: :s/CB/CAMBO :s/CQ/NMARI :s/KV/KOSOV :s/PP/PAPUA ... I have saved that in a file called fipsToNonStd.vim (unsure about the correct extension). I want to run those commands one after another. What's the easiest way to do so? Thanks a bunch! SO Rocks!

    Read the article

  • Simple PHP form Validation and the validation symbols

    - by Cool Hand Luke UK
    Hi have some forms that I want to use some basic php validation (regular expressions) on, how do you go about doing it? I have just general text input, usernames, passwords and date to validate. I would also like to know how to check for empty input boxes. I have looked on the interenet for this stuff but I haven't found any good tutorials. Thanks

    Read the article

  • re.sub emptying list

    - by jmau5
    def process_dialect_translation_rules(): # Read in lines from the text file specified in sys.argv[1], stripping away # excess whitespace and discarding comments (lines that start with '##'). f_lines = [line.strip() for line in open(sys.argv[1], 'r').readlines()] f_lines = filter(lambda line: not re.match(r'##', line), f_lines) # Remove any occurances of the pattern '\s*<=>\s*'. This leaves us with a # list of lists. Each 2nd level list has two elements: the value to be # translated from and the value to be translated to. Use the sub function # from the re module to get rid of those pesky asterisks. f_lines = [re.split(r'\s*<=>\s*', line) for line in f_lines] f_lines = [re.sub(r'"', '', elem) for elem in line for line in f_lines] This function should take the lines from a file and perform some operations on the lines, such as removing any lines that begin with ##. Another operation that I wish to perform is to remove the quotation marks around the words in the line. However, when the final line of this script runs, f_lines becomes an empty lines. What happened? Requested lines of original file: ## English-Geek Reversible Translation File #1 ## (Moderate Geek) ## Created by Todd WAreham, October 2009 "TV show" <=> "STAR TREK" "food" <=> "pizza" "drink" <=> "Red Bull" "computer" <=> "TRS 80" "girlfriend" <=> "significant other"

    Read the article

< Previous Page | 125 126 127 128 129 130 131 132 133 134 135 136  | Next Page >