Regular expression either/or not matching everything
Posted
by dwatransit
on Stack Overflow
See other posts from Stack Overflow
or by dwatransit
Published on 2010-04-27T18:09:41Z
Indexed on
2010/04/27
18:13 UTC
Read the original article
Hit count: 327
regex
I'm trying to parse an HTTP GET request to determine if the url contains any of a number of file types. If it does, I want to capture the entire request. There is something I don't understand about ORing.
The following regular expression only captures part of it, and only if .flv is the first int the list of ORd values.
(I've obscured the urls with spaces because Stackoverflow limits hyperlinks)
regex: GET.?(.flv)|(.mp4)|(.avi).? test text: GET http: // foo.server.com/download/0/37/3000016511/.flv?mt=video/xy match output: GET http: // foo.server.com/download/0/37/3000016511/.flv
I don't understand why the .*? at the end of the regex isnt callowing it to capture the entire text. If I get rid of the ORing of file types, then it works.
Here is the test code in case my explanation doesn't make sense:
public static void main(String[] args) { // TODO Auto-generated method stub String sourcestring = "GET http: // foo.server.com/download/0/37/3000016511/.flv?mt=video/xy"; Pattern re = Pattern.compile("GET .?\.flv."); // this works //output: // [0][0] = GET http :// foo.server.com/download/0/37/3000016511/.flv?mt=video/xy
// the match from the following ends with the ".flv", not the entire url. // also it only works if .flv is the first of the 3 ORd options //Pattern re = Pattern.compile("GET .?(\.flv)|(\.mp4)|(\.avi).?"); // output: //[0][0] = GET http: // foo.server.com/download/0/37/3000016511/.flv // [0][1] = .flv // [0][2] = null // [0][3] = null
Matcher m = re.matcher(sourcestring);
int mIdx = 0;
while (m.find()){
for( int groupIdx = 0; groupIdx < m.groupCount()+1; groupIdx++ ){
System.out.println( "[" + mIdx + "][" + groupIdx + "] = " + m.group(groupIdx));
}
mIdx++;
}
} }
© Stack Overflow or respective owner