Python 3, urllib ... Reset Connection Possible?

Posted by Rhys on Stack Overflow See other posts from Stack Overflow or by Rhys
Published on 2011-01-09T10:30:57Z Indexed on 2011/01/09 10:53 UTC
Read the original article Hit count: 230

Filed under:
|
|

In the larger scale of my program the goal of the below code is to filter out all dynamic html in a web-page source code

code snippet:

try:
    deepreq3 = urllib.request.Request(deepurl3)
    deepreq3.add_header("User-Agent","etc......")
    deepdata3 = urllib.request.urlopen(deepurl3).read().decode("utf8", 'ignore')

The following code is looped 3 times in order to identify whether the target web-page is Dynamic (source code is changed at intervals) or not.

If the page IS dynamic, the above code loops another 15 times and attempts to filter out the dynamic content.

QUESTION:

While this filtering method works 80% of the time, some pages will reload ALL 15 times and STILL contain dynamic code. HOWEVER. If I manually close down the Python Shell and re-execute my program, the dynamic html that my 'refresh-page method' could not shake off is no longer there ... it's been replaced with new dynamic html that my 'refresh-page method' cannot shake off. So I need to know, what is going on here? How is re-running my program causing the dynamic content of a page to change. AND, is there any way, any 'reset connection' command I can use to recreate this ... without manually restarting my app.

Thanks for your response.

© Stack Overflow or respective owner

Related posts about python

Related posts about python-3.x