Generate a table of contents from HTML with Python

Posted by Oli on Stack Overflow See other posts from Stack Overflow or by Oli
Published on 2010-03-25T11:16:13Z Indexed on 2010/03/25 11:33 UTC
Read the original article Hit count: 333

I'm trying to generate a table of contents from a block of HTML (not a complete file - just content) based on its <h2> and <h3> tags.

My plan so far was to:

  • Extract a list of headers using beautifulsoup

  • Use a regex on the content to place anchor links before/inside the header tags (so the user can click on the table of contents) -- There might be a method for replacing inside beautifulsoup?

  • Output a nested list of links to the headers in a predefined spot.

It sounds easy when I say it like that, but it's proving to be a bit of a pain in the rear.

Is there something out there that does all this for me in one go so I don't waste the next couple of hours reinventing the wheel?

A example:

<p>This is an introduction</p>

<h2>This is a sub-header</h2>
<p>...</p>

<h3>This is a sub-sub-header</h3>
<p>...</p>

<h2>This is a sub-header</h2>
<p>...</p>

© Stack Overflow or respective owner

Related posts about python

Related posts about tableofcontents