Hello, we are tasked with basically emulating a browser to fetch webpages, looking to automate tests on different web pages. This will be used for (ideally) console-ish applications that run in the background and generate reports.
We tried going with .NET and the WatiN library, but it was built on a Marshalled IE, and so it lacked many features that we hacked in with calls to unmanaged native code, but at the end of the day IE is not thread safe nor process safe, and many of the needed features could only be implemented by changing registry values and it was just terribly unflexible.
Proxy support
JavaScript support- we have to be able to parse the actual DOM after any javascript has executed (and hopefully an event is raised to handle any ajax calls)
Ability to save entire contents of page including images FROM THE loaded page's CACHE to a separate location
ability to clear cookies/cache, get the cookies/cache, etc.
Ability to set headers and alter post data for any browser call
And for the love of drogs, an API that isn't completely cryptic
Languages acceptable C++, C#, Python, anything that can be a simple little console application that doesn't have a retarded syntax like Ruby.
From my own research, and believe me I am terrible at google searches, I have heard good things about WebKit... would the Qt module QtWebKit handle all these features?