How to extract terms from an HTML document
Posted
by
bookcasey
on Super User
See other posts from Super User
or by bookcasey
Published on 2012-06-21T14:28:18Z
Indexed on
2012/06/21
15:18 UTC
Read the original article
Hit count: 417
I have a HTML document filled with terms that I need to put into a spreadsheet.
They follow this basic pattern:
<ul>
<li class="name"><a href="spot.html">Spot</a></li>
<li class="type">Dog</li>
<li class="color">Red</li>
</ul>
<ul>
<li class="name"><a href="mittens.html">Mittens</a></li>
<li class="type">Cat</li>
<li class="color">Brown</li>
</ul>
<ul>
<li class="name"><a href="squakers.html">Squakers</a></li>
<li class="type">Little Parrot</li>
<li class="color">Rainbow</li>
</ul>
It's very consistent.
I need to extract the string within the li.name a
(so, "Spot") but only if the type
is "Dog" or "Parrot", and put them in a spreadsheet.
I've been trying to use Sublime Text's ability to Find with regex, but I'm really struggling, and since regex and HTML usually don't play nice, I was wondering if there is a better and easier way to accomplish this. Thanks.
© Super User or respective owner