How can I get the file extensions from relative links in HTML text using Perl?
Posted
by Structure
on Stack Overflow
See other posts from Stack Overflow
or by Structure
Published on 2010-03-26T15:13:47Z
Indexed on
2010/03/27
13:03 UTC
Read the original article
Hit count: 116
For example, scanning the contents of an HTML page with a Perl regular expression, I want to match all file extensions but not TLD's in domain names. To do this I am making the assumption that all file extensions must be within double quotes.
I came up with the following, and it is working, however, I am failing to figure out a way to exclude the TLDs in the domains. This will return "com", "net", etc.
m/"[^<>]+\.([0-9A-Za-z]*)"/g
Is it possible to negate the match if there is more than one period between the quotes that are separated by text? (ie: match foo.bar.com but not ./ or ../)
Edit I am using $1
to return the value within parentheses.
© Stack Overflow or respective owner