A question on webpage representation in Java

Posted by Gemma on Stack Overflow See other posts from Stack Overflow or by Gemma
Published on 2010-04-11T05:19:09Z Indexed on 2010/04/11 5:23 UTC
Read the original article Hit count: 264

Filed under:

java

|

html

Hello there. I've followed a tutorial and came up with the following method to read the webpage content into a CharSequence

 public static CharSequence getURLContent(URL url) throws IOException {
       URLConnection conn = url.openConnection();
       String encoding = conn.getContentEncoding();
       if (encoding == null) {
         encoding = "ISO-8859-1";
       }
       BufferedReader br = new BufferedReader(new
           InputStreamReader(conn.getInputStream(),encoding));
       StringBuilder sb = new StringBuilder(16384);
       try {
         String line;
         while ((line = br.readLine()) != null) {
           sb.append(line);
           sb.append('\n');
         }
       } finally {
         br.close();
       }
       return sb;
     }

It will return a representation of the webpage specified by the url. However,this representation is hugely different from what I use "view page source" in my Firefox,and since I need to scrape data from the original webpage(some data segement in the original "view page source" file),it will always fail to find required text on this Java representation. Did I go wrong somewhere?I need your advice guys,thanks a lot for helping!

© Stack Overflow or respective owner

Related posts about java

Tomcat 6: Access Control Exception?

as seen on Server Fault - Search for 'Server Fault'
I'm trying to setup a tomcat6 server, and I'm trying to match another setup someone else established. However, my deployment (default Ubuntu install) uses a policy.d/ directory structure, and the established server just uses a catalina.policy file. I've tried setting every entry in policy.d to match… >>> More
Problem in creation MDB Queue connection at Jboss StartUp

as seen on Stack Overflow - Search for 'Stack Overflow'
I am not able to create a Queue connection in JBOSS4.2.3GA Version & Java1.5, as I am using MDB as per the below details. I am putting this MDB in a jar file(named utsJar.jar) and copied it in deploy folder of JBOSS, In the test env. this MDB works well but in another env. [ env settings and… >>> More
failing to establish connection between Postgres db and gwt

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, I am using Postgres and gwt 2.0 for one of my applications. I am facing problem connecting to the database. When I try to connect it gives "ClassNotFoundException". Here is what I get when I try to connect to database: java.lang.ClassNotFoundException: org.postgresql.Driver at java.net… >>> More
failing to establish connection between postgre db and gwt

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, For i am using postgre and gwt 2.0 for one of my applications. I am facing problem connecting to the database. When i try to connect it gives "ClassNotFoundException". Here is what i get when i try to connect to database: java.lang.ClassNotFoundException: org.postgresql.Driver at java.net… >>> More
Migration and deployement problems JBoss 4.2.2.GA to JBoss 6.0.0.M2

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, I'm trying to migrate an application running on JBoss 4.2.2.GA to JBoss 6.0.0.M2 I give you some log to explain my problem : boot.log : 2010-03-16 09:59:29,406 ERROR [org.jboss.system.server.profileservice.ProfileServiceBootstrap] (Thread-2) Failed to load profile: Summary of incomplete deployments… >>> More

Related posts about html

Install usblib package - Ubuntu

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I need the package libusb for another package I am installing. I tried the following which seemed to install the package, sudo apt-get install libusb-dev but when I try to install the other package I get, configure: error: Package requirements (libusb-1.0 >= 0.9.1) were not met: No package… >>> More
Prevent malicious vulnerability scan increasing load on a server

as seen on Server Fault - Search for 'Server Fault'
Hi all, this week we have been suffering some malicious vulnerability scans to our servers, increasing the load on them, making them nearly unusable. The attack is easy to defend, just blocking the offending ip, but only after discovering it. Is there any form of prevent it? Is it normal that… >>> More
can't install psycopg2 in my env on mac os x lion

as seen on Server Fault - Search for 'Server Fault'
I tried install psycopg2 via pip in my virtual env, but got this error: ld: library not found for -lpq (full log here: http://pastebin.com/XdmGyJ4u ) I tried install postgres 9.1 from .dmg and via port, (gksks)iMac-Alexander:~ lorddaedra$ locate libpq /Developer/SDKs/MacOSX10.7.sdk/usr/include/libpq /Developer/SDKs/MacOSX10… >>> More
Bitnami redmine error SVN

as seen on Server Fault - Search for 'Server Fault'
I'm installing the Bitnami Redmine stack (redmine + subversion). Firstly I install configure and test it locally (Ubuntu 14.04 LTS). And everything is OK. I install Bitnami stack on server (Red Hat 4.4.7-4) and configure SVN. I commit files into SVN and connect project into Redmine with SVN repository… >>> More
Can the .htaccess file slow down a website to a crawl? If so, are there better ways to solve these problems with different rewrite rules and such?

as seen on Pro Webmasters - Search for 'Pro Webmasters'
here is my htaccess file...... RewriteCond %{REQUEST_URI} ^/patients/billing/FAQ_billing\.html$ [OR] RewriteCond %{REQUEST_URI} ^/patients/billing/getintouch\.html$ RewriteRule ^patients/billing/(.*)\.html$ $1.php [L,NC] RewriteCond %{REQUEST_URI} ^/patients/findadoctor/a\.html$ [OR] RewriteCond… >>> More