How to normalize a URL in Java?

Posted by dfrankow on Stack Overflow See other posts from Stack Overflow or by dfrankow
Published on 2010-06-07T22:33:12Z Indexed on 2010/06/07 22:42 UTC
Read the original article Hit count: 307

Filed under:
|

URL normalization (or URL canonicalization) is the process by which URLs are modified and standardized in a consistent manner. The goal of the normalization process is to transform a URL into a normalized or canonical URL so it is possible to determine if two syntactically different URLs are equivalent.

Strategies include lowercasing, adding trailing slashes, https => http, etc. The Wikipedia page lists many.

Got a favorite method of doing this in Java? Perhaps a library (Nutch?), but I'm open. Smaller and fewer dependencies is better.

I'll handcode something for now and keep an eye on this question.

© Stack Overflow or respective owner

Related posts about java

Related posts about url-rewriting