Problem with re.findall (duplicates)
Posted
by
user559385
on Stack Overflow
See other posts from Stack Overflow
or by user559385
Published on 2010-12-31T17:40:49Z
Indexed on
2010/12/31
17:54 UTC
Read the original article
Hit count: 180
Hello,
I tried to fetch source of 4chan site, and get links to threads.
I have problem with regexp (isn't working). Source:
import urllib2, re
req = urllib2.Request('http://boards.4chan.org/wg/')
resp = urllib2.urlopen(req)
html = resp.read()
print re.findall("res/[0-9]+", html)
#print re.findall("^res/[0-9]+$", html)
The problem is that:
print re.findall("res/[0-9]+", html)
is giving duplicates.
I can't use:
print re.findall("^res/[0-9]+$", html)
I have read python docs but they didn't help.
© Stack Overflow or respective owner