Python: find <title>
- by Peter
I have this:
response = urllib2.urlopen(url)
html = response.read()
begin = html.find('<title>')
end = html.find('</title>',begin)
title = html[begin+len('<title>'):end].strip()
if the url = http://www.google.com then the title have no problem as "Google",
but if the url = "http://www.britishcouncil.org/learning-english-gateway" then the title become
"<!doctype html public "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
<HEAD>
<base href="http://www.britishcouncil.org/" />
<META http-equiv="Content-Type" Content="text/html;charset=utf-8">
<meta name="WT.sp" content="Learning;Home Page Smart View" />
<meta name="WT.cg_n" content="Learn English Gateway" />
<META NAME="DCS.dcsuri" CONTENT="/learning-english-gateway.htm">..."
What is actually happening, why I couldn't return the "title"?