input URL, output contents of "view page source", i.e. after javascript / etc, library or command-li

Posted by Ryan Berckmans on Stack Overflow See other posts from Stack Overflow or by Ryan Berckmans
Published on 2010-05-26T13:45:56Z Indexed on 2010/05/26 13:51 UTC
Read the original article Hit count: 221

Filed under:

JavaScript

|

web

|

html-parsing

I need a scalable, automated, method of dumping the contents of "view page source" (DOM) to a file. Programs such as wget or curl will non-interactively retrieve a set of URLs, but do not execute javascript or any of that 'fancy stuff'.

My ideal solution looks like any of the following (fantasy solutions):

cat urls.txt | google-chrome --quiet --no-gui \
--output-sources-directory=~/urls-source  
(fantasy command line, no idea if flags like these exist)

or

cat urls.txt | python -c "import some-library; \
... use some-library to process urls.txt ; output sources to ~/urls-source"

As a secondary concern, I also need:

dump all included javascript source to file (a la firebug)
dump pdf/image of page to file (print to file)

© Stack Overflow or respective owner

Related posts about JavaScript

CHAT ROOMs 7 by 6

as seen on Stack Overflow - Search for 'Stack Overflow'
I am looking for chatroom on one page with 7 loggedin users and 6+rows for say 42 users.these users will keep on adding wthnew users.Need urgent help.A PRETTY UNUSUAL Q FOR MOST OF U.What is MORE REQ new features: Usernames are unique to users currently chatting You can see a "currently chatting"… >>> More
Integrating JavaScript Unit Tests with Visual Studio

as seen on Stephen Walter - Search for 'Stephen Walter'
Modern ASP.NET web applications take full advantage of client-side JavaScript to provide better interactivity and responsiveness. If you are building an ASP.NET application in the right way, you quickly end up with lots and lots of JavaScript code. When writing server code, you should be writing… >>> More
Has Javascript developed beyond what it was originally designed to do?

as seen on Programmers - Search for 'Programmers'
I've been talking with a friend about the purpose of Javascript, when and how it should be used, etc. He quoted that: JavaScript was designed to add interactivity to HTML pages [...] JavaScript gives HTML designers a programming tool HTML authors are normally not programmers… >>> More
PHP, javascript, single quote problems with IE when passing variable from ajax post to javascript fu

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi! I have been trying to get this to work for a while, and I suspect there's an easy solution that I just can't find. My head feels like jelly and I would really appreciate any help. My main page.php makes a .post() to backend.php and fetches a list of cities which it echoes in the form of: <li… >>> More
Javascript in XSL that is loaded by Javascript

as seen on Stack Overflow - Search for 'Stack Overflow'
Is there anyway to have javascript run when a XSL sheet has been applied to an XML file by Javascript? I am using a JQuery plugin to apply the sheet to the xml but the javascript that is located inside of the XSL file will not run. I put the Javascript at the bottom of the file and it still does… >>> More

Related posts about web

Why is Java EE 6 better than Spring ?

as seen on Oracle Blogs - Search for 'Oracle Blogs'
Java EE 6 was released over 2 years ago and now there are 14 compliant application servers. In all my talks around the world, a question that is frequently asked is Why should I use Java EE 6 instead of Spring ? There are already several blogs covering that topic: Java EE… >>> More
Hosting a website on Heroku.... I know how to, but im running into problems!

as seen on Pro Webmasters - Search for 'Pro Webmasters'
I'm starting to learn more on the back-end scale of programing. Recently I started up Heroku for the second or third time. This time I actually installed the Git update to my Mac and installed Heroku in the terminal. I wanted to upload a static html site with the sinatra gem. Everything worked out… >>> More
Microsoft .NET Web Programming: Web Sites versus Web Applications

as seen on Samir ASP.NET with C# Technology - Search for 'Samir ASP.NET with C# Technology'
In .NET 2.0, Microsoft introduced the Web Site. This was the default way to create a web Project in Visual Studio 2005. In Visual Studio 2008, the Web Application has been restored as the default web Project in Visual Studio/.NET 3.x The Web Site is a file/folder based Project structure. It… >>> More
VS2008 - Unable to Add Web Reference to Web Application Project (The web services enumeration compon

as seen on Stack Overflow - Search for 'Stack Overflow'
I've run into a situation where I was unable to add a Web Reference in Visual Studio 2008 to a Web Application Project. The error I couldn't resolve was "The web services enumeration components are not available. You need to reinstall Visual Studio to add web references to your application." How… >>> More
Outlook Web Access: "Outlook Web Access has encountered a Web browsing error"

as seen on Super User - Search for 'Super User'
When one of my colleagues is accessing Outlook Web Access from IE, he frequently gets an error reported: "Outlook Web Access has encountered a Web browsing error". The error report includes the following: Client Information User Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4… >>> More