Retrieving the first picture with a HTML parser
Posted
by justin01
on Stack Overflow
See other posts from Stack Overflow
or by justin01
Published on 2010-06-03T19:02:37Z
Indexed on
2010/06/03
19:04 UTC
Read the original article
Hit count: 141
Hey guys,
(Not a native english speaker)
I'm doing a personal project in PHP in which I use the Simple HTML Parser to parse the HTML of a given URL and retrieve the first image in a DIV that have a specific ID or class (maincontent, content, main, wrapper, etc. - it's all in an array) and ignore ads. The goal is to take this image and make a thumbnail with it, pretty much like on Digg and others.
I thought everything was working fine until I tried my script with the website Snopes ("http://www.snopes.com/photos/animals/luckycoyote.asp" <- this page more exactly).
The source of the first image it gets is: " graphics/luckycoyote1.jpg ". So far, to correct this problem I created a little function that gets the domain name of the given URL and insert it before the IMG's source attribute. So for sites like Snopes.com, it gives me: "http://www.snopes.com/graphics/luckycoyote1.jpg" ... while the real URL for this image is "http://www.snopes.com*/photos/animals/graphics/luckycoyote1.jpg*" (or, more precisely: " http://graphics1.snopes.com/photos/animals/graphics/luckycoyote1.jpg " - note the subdomain here).
So my main question is: how can I externally/dynamically retrieve the full URL address of an image ("absolute path") when I am only given the "relative path"? I'm pretty sure this is possible, since when I paste the link in Facebook's "What are you doing?" field for example, it gives me the correct path to the image while on the website, the source of the image is only (example) "image/photo/example.jpg".
Thank you for your time.
© Stack Overflow or respective owner