beautifulsoup can't find exist href in file

Posted by young001 on Stack Overflow See other posts from Stack Overflow or by young001
Published on 2012-06-17T01:59:41Z Indexed on 2012/06/17 3:16 UTC
Read the original article Hit count: 215

Filed under:
|

I have a html file like following:

<form action="/2811457/follow?gsid=3_5bce9b871484d3af90c89f37" method="post">
<div>
<a href="/2811457/follow?page=2&amp;gsid=3_5bce9b871484d3af90c89f37">next_page</a>
&nbsp;<input name="mp" type="hidden" value="3" />
<input type="text" name="page" size="2" style='-wap-input-format: "*N"' />
<input type="submit" value="jump" />&nbsp;1/3
</div>
</form>

how to extract the "1/3" from the file?

It is a part of html,I intend to make it clear. When I use beautifulsoup,

I'm new to beautifulsoup,and I have look the document,but still confused.

how to extract"1/3" from the html file?

total_urls_num = soup.find(re.compile('.*/d\//d.*'))

doesn't work

As JBernardo said,\d should be a number,When I change to .*\d/\d.*,it doesn't work too.

my code:

from BeautifulSoup import BeautifulSoup
import re

with open("html.txt","r") as f:
    response = f.read()
    print response
    soup = BeautifulSoup(response)
    delete_urls = soup.findAll('a', href=re.compile('follow\?page'))   #works
    print delete_urls
    #total_urls_num = soup.find(re.compile('.*\d/\d.*'))
    total_urls_num = soup.find('input',style='submit')   #can't work
    print total_urls_num

© Stack Overflow or respective owner

Related posts about python

Related posts about beautifulsoup