Problem with re.findall (duplicates)
- by user559385
Hello,
I tried to fetch source of 4chan site, and get links to threads.
I have problem with regexp (isn't working). Source:
import urllib2, re
req = urllib2.Request('http://boards.4chan.org/wg/')
resp = urllib2.urlopen(req)
html = resp.read()
print re.findall("res/[0-9]+", html)
#print re.findall("^res/[0-9]+$", html)
The problem is that:
print re.findall("res/[0-9]+", html)
is giving duplicates.
I can't use:
print re.findall("^res/[0-9]+$", html)
I have read python docs but they didn't help.