Clean up domain list in Excel - regex / macros?
Posted
by
Tim
on Stack Overflow
See other posts from Stack Overflow
or by Tim
Published on 2012-11-05T09:19:49Z
Indexed on
2012/11/05
11:01 UTC
Read the original article
Hit count: 299
I have a huge spreadsheet of domains that I need to clean up as follows:
- Remove all
http://
(simple replace all - "http://" with "") - Remove any
www.
(simple replace all - "www." with "") - Delete any sub-domains (delete the actual row completely, not just the subdomain from the url)
- Remove anything after the domain extension (i.e.
website.com/blah/blahbah/
becomes justwebsite.com
(simple replace all -"/*"
with""
, then replace all"/"
with""
)
So what I'm left with is just a spreadsheet of clean domains like "website.com"
.
I think I've got 1, 2 and 4 sorted (as above), but I'm really struggling with 3.
Any ideas? Can I do this with regexp / vba, and actually delete the row completely?
Sample data:
http://www.scholastic.com/kids/stacks/games/
http://imgworld.teamworkonline.com/
http://topfreegraphics.com/
http://www.workcircle.co.uk/
http://www.healthycanadians.gc.ca/index-eng.php
http://gsociology.icaap.org/methods/soft.html
Post 1, 2 and 4 would leave me with:
scholastic.com
imgworld.teamworkonline.com
topfreegraphics.com
workcircle.co.uk
healthycanadians.gc.ca
gsociology.icaap.org
It's those pesky sub-domains I need to just delete completely, just delete the row. I've realised I can't just search for 2 x ".", because obviously plenty of domain extensions (i.e .co.uk
) include that.
Any help appreciated.
© Stack Overflow or respective owner