SGML Parser in Python
Posted
by
afg102
on Stack Overflow
See other posts from Stack Overflow
or by afg102
Published on 2011-01-08T08:50:13Z
Indexed on
2011/01/08
8:53 UTC
Read the original article
Hit count: 150
I am completely new to Python. I have the following code:
class ExtractTitle(sgmllib.SGMLParser):
def __init__(self, verbose=0):
sgmllib.SGMLParser.__init__(self, verbose)
self.title = self.data = None
def handle_data(self, data):
if self.data is not None:
self.data.append(data)
def start_title(self, attrs):
self.data = []
def end_title(self):
self.title = string.join(self.data, "")
raise FoundTitle # abort parsing!
which extracts the title element from SGML, however it only works for a single title. I know I have to overload the unknown_starttag and unknown_endtag in order to get all titles but I keep getting it wrong. Help me please!!!
© Stack Overflow or respective owner