用java获取网页html,求助
Java code
URL url = new URL("http://bbs.ustc.edu.cn/main.html");
InputStreamReader isr = new InputStreamReader(url.openStream(),"GB2312");
BufferedReader br = new BufferedReader(isr);
String strRead = "";
StringBuilder sb = new StringBuilder();
while ((strRead = br.readLine()) != null) {
sb.append(strRead+'\n');
}
br.close();
String res = sb.toString();
最后得到的res看不到网页的内容,提示dtd。。。
求助~~
------解决方案--------------------
个人认为:应该用字节流读取。
粗略写了一下,如:
URL url = new URL("http://bbs.ustc.edu.cn/main.html");
InputStream input = url.openStream();
ByteArrayOutputStream output = new ByteArrayOutputStream();
int len = -1;
byte[] b = new byte[1024];
while( (len = input.read(b)) != -1){
output.write(b, 0, len);
}
String body = output.toString();
System.out.println(body);
------解决方案--------------------读取某个网页一般用ajax或者http协议之类