日期:2014-05-17  浏览次数:20664 次

java.net.ProtocolException: Server redirected too many times (20)
求JAVA网络编程高手,指点,指点!!!!

我写的一个网络爬虫采集,爬Google页面会出异常,求解决方案!!!!

Java code
    private byte[] queryData() throws Exception {
        java.net.URL connUrl = new URL(url);
        
        java.net.HttpURLConnection conn = (HttpURLConnection) connUrl.openConnection();
        conn.setRequestProperty("User-agent","Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727; Maxthon 2.0)");
        java.io.InputStream input = conn.getInputStream();
        byte[] data = new byte[1024];
        int length = 0;
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        while ((length = input.read(data)) > 0) {
            baos.write(data, 0, length);
        }
        conn.disconnect();
        return baos.toByteArray();
    }




URL地址为:http://www.google.com.hk/search?q=%E5%A6%87%E5%A5%B3&hl=zh-CN
异常信息如下:

java.net.ProtocolException: Server redirected too many times (20)
  at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLCon
nection.java:1315)
  at com.xdtech.platform.util.source.SourceFetch.queryData(SourceFetch.jav
a:41)
  at com.xdtech.platform.util.source.SourceFetch.queryUrl(SourceFetch.java
:29)
  at com.xdtech.platform.util.source.inter.AbstractSource.queryUrl(Abstrac
tSource.java:72)
  at com.xdtech.platform.util.source.Template.SearchFilteByTemplateChange.
filterByPages(SearchFilteByTemplateChange.java:187)
  at com.xdtech.platform.service.source.IndexSourceDataService.collectData
ByPage(IndexSourceDataService.java:147)
  at com.xdtech.platform.core.service.SourceFetchExecutorPool$CategoryFetc
h.run(SourceFetchExecutorPool.java:107)

其中at com.xdtech.platform.util.source.SourceFetch.queryData(SourceFetch.java:41) 指的是代码中的
Java code
java.io.InputStream input = conn.getInputStream();



求高手救救俺,,,,

如果把URL地址中“&hl=zh-CN” 去掉就不会出异常,但是却是繁体内容!!





------解决方案--------------------
Java code
        String cookie = "";
        do {
            HttpURLConnection conn = (HttpURLConnection) new URL("http://www.google.com.hk/search?q=%E5%A6%87%E5%A5%B3&hl=zh-CN").openConnection();
            if(cookie.length() != 0)
                conn.setRequestProperty("Cookie", cookie);
            conn.setRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 8.0)");
            conn.setInstanceFollowRedirects(false);
            int code = conn.getResponseCode();
            if(code == HttpURLConnection.HTTP_MOVED_TEMP) {
                cookie += conn.getHeaderField("Set-Cookie") + ";";
            }
            if(conn.getResponseCode() == HttpURLConnection.HTTP_OK)
                break;
        } while(true);