In Python BeautifulSoup How to move tags

Posted by JJ on Stack Overflow See other posts from Stack Overflow or by JJ
Published on 2010-04-28T19:04:53Z Indexed on 2010/04/28 22:57 UTC
Read the original article Hit count: 355

Filed under:
|
|
|
|

I have a partially converted XML document in soup coming from HTML. After some replacement and editing in the soup, the body is essentially -

<Text...></Text>   # This replaces <a href..> tags but automatically creates the </Text>
<p class=norm ...</p>
<p class=norm ...</p>
<Text...></Text>
<p class=norm ...</p> and so forth.  

I need to "move" the <p> tags to be children to <Text> or know how to suppress the </Text>. I want -

<Text...> 
<p class=norm ...</p>
<p class=norm ...</p>
</Text>
<Text...>
<p class=norm ...</p>
</Text>  

I've tried using item.insert and item.append but I'm thinking there must be a more elegant solution.

for item in soup.findAll(['p','span']):     
    if item.name == 'span' and item.has_key('class') and item['class'] == 'section':
        xBCV = short_2_long(item._getAttrMap().get('value',''))
        if currentnode:
            pass
        currentnode = Tag(soup,'Text', attrs=[('TypeOf', 'Section'),... ])
        item.replaceWith(currentnode) # works but creates end tag
    elif item.name == 'p' and item.has_key('class') and item['class'] == 'norm':
        childcdatanode = None
        for ahref in item.findAll('a'):
            if childcdatanode:
                pass   
            newlink = filter_hrefs(str(ahref))
            childcdatanode = Tag(soup, newlink)
            ahref.replaceWith(childcdatanode)

Thanks

© Stack Overflow or respective owner

Related posts about python

Related posts about beautifulsoup