serverXMLHTTP采集遇到的有关问题-ASP教程-爱易网页

serverXMLHTTP采集遇到的有关问题

日期：2014-05-17　浏览次数：21424 次

serverXMLHTTP采集遇到的问题
最近写一个采集程序，别的都没问题，但是搜狐却怎么都采集不上，大家帮看看什么问题，代码很简单：

VBScript code


dim url:url = "http://news.sohu.com/20100807/n274046449.shtml"
set xmlHttp = server.CreateObject("msxml2.serverXMLHTTP.3.0")
xmlHttp.setTimeouts 10000,10000,100000,100000
xmlHttp.open "GET",url,false
xmlHttp.setRequestHeader "User-Agent","Mozilla/5.0 (Windows; U; Windows NT 6.0; zh-CN; rv:1.9.0.3)Gecko/2008092417 Firefox/3.0.3 (.NET CLR 3.5.30729)"
xmlHttp.setRequestHeader "Pragma","no-cache"
xmlHttp.setRequestHeader "Cache-Control","no-cache"
xmlHttp.setRequestHeader "Accept-Encoding","none"
xmlHttp.send()
if xmlHttp.readystate = 4 then
    if xmlHttp.status = 200 then
        Response.Write xmlHttp.responseText
    end if
end if
set xmlHttp = nothing

采集到的内容是一个很短的乱码，但是head却显示content-length有1w多

------解决方案--------------------
噢，你这个是responseText的问题，.responseText方法把返回的源码转化成了UTF-8编码而源页面时GB2312的，自然会错
一般用responseBody直接返回二进制流，再用ADODB.Stream对象把流对象转化为GB2312
网上这方面资料很多，我随便百度了个给你提示下
http://blog.sina.com.cn/s/blog_48c1f4d0010003jy.html
------解决方案--------------------
原来是gzip压缩的。
不知道有没有可以直接处理的函数。
不过，貌似网上有提到说可以先存为zip文件，再调用控件来解压。
------解决方案--------------------
这是我用的，没问题。
rContent=getHTTPPage(rUrl,"gb2312")

function getHTTPPage(url,cset)
dim Http
set Http=server.createobject("MSXML2.XMLHTTP")
Http.open "GET",url,false
Http.send()
if Http.readystate<>4 then exit function
getHTTPPage=bytesToBSTR(Http.responseBody,cset)
set http=nothing
if err.number<>0 then err.Clear
end function

'读URL内容_2/2
Function BytesToBstr(body,Cset)
dim objstream
set objstream = Server.CreateObject("adodb.stream")
objstream.Type = 1
objstream.Mode =3
objstream.Open
objstream.Write body
objstream.Position = 0
objstream.Type = 2
objstream.Charset = Cset
BytesToBstr = objstream.ReadText
objstream.Close
set objstream = nothing
End Function

免责声明： 本文仅代表作者个人观点，与爱易网无关。其原创性以及文中陈述文字和内容未经本站证实，对本文以及其中全部或者部分内容、文字的真实性、完整性、及时性本站不作任何保证或承诺，请读者仅作参考，并请自行核实相关内容。

serverXMLHTTP采集遇到的有关问题

相关资料更多>

推荐阅读更多>