Regular expression either/or not matching everything
- by dwatransit
I'm trying to parse an HTTP GET request to determine if the url contains any of a number of file types. If it does, I want to capture the entire request. There is something I don't understand about ORing.
The following regular expression only captures part of it, and only if .flv is the first int the list of ORd values.
(I've obscured the urls with spaces because Stackoverflow limits hyperlinks)
regex:
GET.?(.flv)|(.mp4)|(.avi).?
test text:
GET http: // foo.server.com/download/0/37/3000016511/.flv?mt=video/xy
match output:
GET http: // foo.server.com/download/0/37/3000016511/.flv
I don't understand why the .*? at the end of the regex isnt callowing it to capture the entire text. If I get rid of the ORing of file types, then it works.
Here is the test code in case my explanation doesn't make sense:
public static void main(String[] args) {
// TODO Auto-generated method stub
String sourcestring = "GET http: // foo.server.com/download/0/37/3000016511/.flv?mt=video/xy";
Pattern re = Pattern.compile("GET .?\.flv."); // this works
//output:
// [0][0] = GET http :// foo.server.com/download/0/37/3000016511/.flv?mt=video/xy
// the match from the following ends with the ".flv", not the entire url.
// also it only works if .flv is the first of the 3 ORd options
//Pattern re = Pattern.compile("GET .?(\.flv)|(\.mp4)|(\.avi).?");
// output:
//[0][0] = GET http: // foo.server.com/download/0/37/3000016511/.flv
// [0][1] = .flv
// [0][2] = null
// [0][3] = null
Matcher m = re.matcher(sourcestring);
int mIdx = 0;
while (m.find()){
for( int groupIdx = 0; groupIdx < m.groupCount()+1; groupIdx++ ){
System.out.println( "[" + mIdx + "][" + groupIdx + "] = " + m.group(groupIdx));
}
mIdx++;
}
}
}