How can I get the file extensions from relative links in HTML text using Perl?
- by Structure
For example, scanning the contents of an HTML page with a Perl regular expression, I want to match all file extensions but not TLD's in domain names. To do this I am making the assumption that all file extensions must be within double quotes.
I came up with the following, and it is working, however, I am failing to figure out a way to exclude the TLDs in the domains. This will return "com", "net", etc.
m/"[^<>]+\.([0-9A-Za-z]*)"/g
Is it possible to negate the match if there is more than one period between the quotes that are separated by text? (ie: match foo.bar.com but not ./ or ../)
Edit I am using $1 to return the value within parentheses.