SGML Parser in Python
- by afg102
I am completely new to Python. I have the following code:
class ExtractTitle(sgmllib.SGMLParser):
def __init__(self, verbose=0):
sgmllib.SGMLParser.__init__(self, verbose)
self.title = self.data = None
def handle_data(self, data):
if self.data is not None:
self.data.append(data)
def start_title(self, attrs):
self.data = []
def end_title(self):
self.title = string.join(self.data, "")
raise FoundTitle # abort parsing!
which extracts the title element from SGML, however it only works for a single title. I know I have to overload the unknown_starttag and unknown_endtag in order to get all titles but I keep getting it wrong. Help me please!!!