Problem when get pageContent of URL in java ?
- by tiendv
Hi all !
i have a code for get pagecontent from a URL 
here is code !
    import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
public class GetPageFromURLAction extends Thread {
    public String stringPageContent;
    public String targerURL;
    public  String getPageContent(String targetURL) throws IOException {
            String returnString="";
            URL urlString = new URL(targetURL);
            URLConnection openConnection = urlString.openConnection();
            String temp;
             BufferedReader in = new BufferedReader(new InputStreamReader(openConnection.getInputStream()));
                while ((temp = in.readLine()) != null) 
                {
                    returnString += temp + "\n";        
                }       
                in.close();
              //  String nohtml = sb.toString().replaceAll("\\<.*?>","");
                return returnString;
     }
    public String getStringPageContent() {
        return stringPageContent;
    }
    public void setStringPageContent(String stringPageContent) {
        this.stringPageContent = stringPageContent;
    }
    public String getTargerURL() {
        return targerURL;
    }
    public void setTargerURL(String targerURL) {
        this.targerURL = targerURL;
    }
    @Override
    public void run() {
        try {
            this.stringPageContent=this.getPageContent(targerURL);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}
The problem is :
1 Some time i receive a error lik 405 ,or 403 HTTP error ... and result string is null .
To  repair  i check permission to connect URL but it usualy return null 
URLConnection openConnection = urlString.openConnection();
openConnection.getPermission(
)
is mean that i don't have permission to acess link ?
To get resultString without HTML Tag ? i do like that 
String nohtml = sb.toString().replaceAll("\<.*?","");
Para sb is Stringbulder , but it can't remove all HTML Tab in string return ?
I use thread here because i must get page alot of url ,
so how can i cread a multi thread to impro speed of program !
Thanks