Hey guys, I'm not trying to do anything malicious here, I just need to do some homework. I'm a fairly new programmer, I'm using python 3.0, and I having difficulty using recursion for problem-solving. I've been stuck on this question for quite a while. Here's
the assignment:
Write a recursive method spam(url, n) that takes a url of a web page as input and a non-negative integer n, collects all
the email address contained in
the web page and adds them to a global dictionary variable spam_dict, and then recursively calls itself on every http hyperlink contained in
the web page. You will use a dictionary so only one copy of every email address is save; your dictionary will store (key,value) pairs (email, email).
The recursive call should use
the parameter n-1 instead of n. If n = 0, you should collect
the email addresses but no recursive calls should be made.
The parameter n is used to limit
the recursion to at most depth n. You will need to use
the solutions of
the two
above problems; you method spam() will call
the methods links2() and emails() and possibly other functions as well. Notes: 1. running spam() directly will produce no output on
the screen; to find your spam_dict, you will need to read
the value of spam_dict, and you will also need to reset it to
the empty dictionary before every run of spam. 2. Recall how global variables are used.
Usage:
spam_dict = {}
spam('http://reed.cs.depaul.edu/lperkovic/csc242/test1.html',0)
spam_dict.keys()
dict_keys([])
spam_dict = {}
spam('http://reed.cs.depaul.edu/lperkovic/csc242/test1.html',1)
spam_dict.keys()
dict_keys(['
[email protected]', '
[email protected]'])
So far, I've written a function that traverses web pages and puts all
the links in a nice little list, and what I wanted to do was call that functions. And why would I use recursion on a dictionary? And how? I don't understand how n ties into all of this.
def links2(url):
content = str(urlopen(url).read())
myparser = MyHTMLParser()
myparser.feed(content)
lst = myparser.get()
mergelst = []
for link in lst:
mergelst.append(urljoin(lst[0],link))
print(mergelst)
Any input (except why spam is bad) would be greatly appreciated. Also, I realize that
the above function could probably look better, if you have a way to do it, I'm all ears. However, all I need is
the point is for
the program to produce
the proper output.