日期:2014-05-20  浏览次数:20636 次

用java获取网页html,求助
Java code
URL url = new URL("http://bbs.ustc.edu.cn/main.html");
            InputStreamReader isr = new InputStreamReader(url.openStream(),"GB2312");
            BufferedReader br = new BufferedReader(isr);
            String strRead = "";
            StringBuilder sb = new StringBuilder();
            while ((strRead = br.readLine()) != null) {

                sb.append(strRead+'\n');
            }
            br.close();
            String res = sb.toString();

最后得到的res看不到网页的内容,提示dtd。。。
求助~~

------解决方案--------------------
个人认为:应该用字节流读取。
粗略写了一下,如:
        URL url = new URL("http://bbs.ustc.edu.cn/main.html");
InputStream input = url.openStream();
ByteArrayOutputStream output = new ByteArrayOutputStream();
int len = -1;
byte[] b = new byte[1024];
while( (len = input.read(b)) != -1){
output.write(b, 0, len);
}
String body = output.toString();
System.out.println(body);
------解决方案--------------------
读取某个网页一般用ajax或者http协议之类