input URL, output contents of "view page source", i.e. after javascript / etc, library or command-li
Posted
by Ryan Berckmans
on Stack Overflow
See other posts from Stack Overflow
or by Ryan Berckmans
Published on 2010-05-26T13:45:56Z
Indexed on
2010/05/26
13:51 UTC
Read the original article
Hit count: 206
I need a scalable, automated, method of dumping the contents of "view page source" (DOM) to a file. Programs such as wget or curl will non-interactively retrieve a set of URLs, but do not execute javascript or any of that 'fancy stuff'.
My ideal solution looks like any of the following (fantasy solutions):
cat urls.txt | google-chrome --quiet --no-gui \
--output-sources-directory=~/urls-source
(fantasy command line, no idea if flags like these exist)
or
cat urls.txt | python -c "import some-library; \
... use some-library to process urls.txt ; output sources to ~/urls-source"
As a secondary concern, I also need:
- dump all included javascript source to file (a la firebug)
- dump pdf/image of page to file (print to file)
© Stack Overflow or respective owner