Problem with re.findall (duplicates)

Posted by user559385 on Stack Overflow See other posts from Stack Overflow or by user559385
Published on 2010-12-31T17:40:49Z Indexed on 2010/12/31 17:54 UTC
Read the original article Hit count: 180

Filed under:
|
|

Hello,

I tried to fetch source of 4chan site, and get links to threads.

I have problem with regexp (isn't working). Source:

import urllib2, re

req = urllib2.Request('http://boards.4chan.org/wg/')
resp = urllib2.urlopen(req)
html = resp.read()

print re.findall("res/[0-9]+", html)
#print re.findall("^res/[0-9]+$", html)

The problem is that:

print re.findall("res/[0-9]+", html)

is giving duplicates.

I can't use:

print re.findall("^res/[0-9]+$", html)

I have read python docs but they didn't help.

© Stack Overflow or respective owner

Related posts about python

Related posts about html