Python lxml - returns null list
Posted
by
Chris Finlayson
on Stack Overflow
See other posts from Stack Overflow
or by Chris Finlayson
Published on 2014-08-18T15:55:11Z
Indexed on
2014/08/18
16:22 UTC
Read the original article
Hit count: 158
I cannot figure out what is wrong with the XPATH when trying to extract a value from a webpage table. The method seems correct as I can extract the page title and other attributes, but I cannot extract the third value, it always returns an empty list?
from lxml import html
import requests
test_url = 'SC312226'
page = ('https://www.opencompany.co.uk/company/'+test_url)
print 'Now searching URL: '+page
data = requests.get(page)
tree = html.fromstring(data.text)
print tree.xpath('//title/text()') # Get page title
print tree.xpath('//a/@href') # Get href attribute of all links
print tree.xpath('//*[@id="financial"]/table/tbody/tr/td[1]/table/tbody/tr[2]/td[1]/div[2]/text()')
Unless i'm missing something, it would appear the XPATH is correct:
I checked Chrome console, appears ok! So i'm at a loss
$x ('//*[@id="financial"]/table/tbody/tr/td[1]/table/tbody/tr[2]/td[1]/div[2]/text()')
[
"£432,272"
]
© Stack Overflow or respective owner