Trying to parse links in an HTML directory listing using Java regex
- by DiskCrasher
Ok I know everyone is going to tell me not to use RegEx for parsing HTML, but I'm programming on Android and don't have ready access to an HTML parser (that I'm aware of). Besides, this is server generated HTML which should be more consistent than user-generated HTML.
The regex looks like this:
Pattern patternMP3 = Pattern.compile(
"<A HREF=\"[^\"]+.+\\.mp3</A>",
Pattern.CASE_INSENSITIVE |
Pattern.UNICODE_CASE);
Matcher matcherMP3 = patternMP3.matcher(HTML);
while (matcherMP3.find()) { ... }
The input HTML is all on one line, which is causing the problem. When the HTML is on separate lines this pattern works. Any suggestions?