日期:2014-05-17  浏览次数:20629 次

解析页面
各位大神&大婶,我现在要读取一个纯英文的页面,用的是bufferReader,可是解析出来全是乱码,换了好几种编码格式,utf-8,iso8859-1,gbk,gb2312都用了,不过读出来全是乱码,请问这是什么个原因啊?急求各位大神指点。。。。。,代码如下:
URL u = new URL(url);
HttpURLConnection conn = (HttpURLConnection)u.openConnection();
BufferedReader br = new BufferedReader(new 
                         InputStreamReader(conn.getInputStream(),"utf-8"));
StringBuffer sb= new StringBuffer();
String line = null;
while((line = br.readLine())!=null){
    sb.append(line+"\n");
}//end while
conn.disconnect();
page=sb.toString();
网页在这:http://statutes.agc.gov.sg/aol/browse/yearResults.w3p;type=actsSup;year=2006

------解决方案--------------------
String strURL = "http://statutes.agc.gov.sg/aol/browse/yearResults.w3p;type=actsSup;year=2006";
URL url;
url = new URL(strURL);
HttpURLConnection httpConn = (HttpURLConnection) url.openConnection();
InputStream inStream = httpConn.getInputStream();
GZIPInputStream gzipStream = new GZIPInputStream(inStream);
ByteArrayOutputStream outStream = new ByteArrayOutputStream();
byte[] buffer = new byte[1024];
int len = -1;
while ((len = gzipStream.read(buffer)) != -1) {
outStream.write(buffer, 0, len);
}
byte[] data = outStream.toByteArray();
outStream.close();
gzipStream.close();
inStream.close();
System.out.println(new String(data, "utf-8")); ;