How to normalize a URL in Java?
Posted
by dfrankow
on Stack Overflow
See other posts from Stack Overflow
or by dfrankow
Published on 2010-06-07T22:33:12Z
Indexed on
2010/06/07
22:42 UTC
Read the original article
Hit count: 307
java
|url-rewriting
URL normalization (or URL canonicalization) is the process by which URLs are modified and standardized in a consistent manner. The goal of the normalization process is to transform a URL into a normalized or canonical URL so it is possible to determine if two syntactically different URLs are equivalent.
Strategies include lowercasing, adding trailing slashes, https => http, etc. The Wikipedia page lists many.
Got a favorite method of doing this in Java? Perhaps a library (Nutch?), but I'm open. Smaller and fewer dependencies is better.
I'll handcode something for now and keep an eye on this question.
© Stack Overflow or respective owner