一个网页抓数据的有关问题，高难度-C#教程-爱易网页

一个网页抓数据的有关问题，高难度

日期：2014-05-18　浏览次数：20757 次

一个网页抓数据的问题，高难度请指教。
HttpHelper类的主要代码如下：

C# code


        private CookieContainer cc;
        private string contentType = "application/x-www-form-urlencoded";
        private string accept = "image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/x-shockwave-flash, application/x-silverlight, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, application/x-ms-application, application/x-ms-xbap, application/vnd.ms-xpsdocument, application/xaml+xml, application/x-silverlight-2-b1, */*";
        private string userAgent = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022)";
        private Encoding encoding = Encoding.GetEncoding("gb2312");

        public string GetHtml(string url, CookieContainer cookieContainer)
        {
            HttpWebRequest httpWebRequest;

            httpWebRequest = (HttpWebRequest)HttpWebRequest.Create(url);
            httpWebRequest.CookieContainer = cookieContainer;
            httpWebRequest.ContentType = contentType;
            httpWebRequest.Referer = url;
            httpWebRequest.Accept = accept;
            httpWebRequest.UserAgent = userAgent;
            httpWebRequest.Method = "GET";

            HttpWebResponse httpWebResponse;
            httpWebResponse = (HttpWebResponse)httpWebRequest.GetResponse();
            Stream responseStream = httpWebResponse.GetResponseStream();
            StreamReader streamReader = new StreamReader(responseStream, encoding);
            string html = streamReader.ReadToEnd();
            streamReader.Close();
            responseStream.Close();

            return html;
        }

调用该方法的代码如下

C# code


            HttpHelper helper = new HttpHelper();
            string ss = helper.GetHtml("http://bill.finance.sina.com.cn/bill/detail.php?stock_code=sh600550&bill_size=40000");

我现在要抓取的页面是http://bill.finance.sina.com.cn/bill/detail.php?stock_code=sh600550&bill_size=40000
如果抓取的页面是http://www.sina.com.cn，没有任何问题。
可是抓取上述页面就有问题，应该是上面这个页面做了什么限制或判断，不知哪位高手能给看一下？
谢谢！

------解决方案--------------------
用我这个方法就可以了！我试过了的！

public string gethtml(string url)
{
string text2 = "";
WebClient client1 = new WebClient();
try
{
byte[] buffer1 = client1.DownloadData(url);

string text1 = Encoding.Default.GetString(buffer1);
text2 = text1;
}
catch
{
text2 = null;
}
return text2;
}

------解决方案--------------------
http://blog.csdn.net/jiang_jiajia10/archive/2008/11/18/3325407.aspx
------解决方案--------------------
网页经过deflate压缩的

System.IO.Compression.DeflateStream responseStream =new System.IO.Compression.DeflateStream( httpWebResponse.GetResponseStream(),System.IO.Compression.CompressionMode.Decompress);

*****************************************************************************
欢迎使用CSDN论坛专用阅读器 : CSDN Reader(附全部源代码)

http://feiyun0112.cnblogs.com/
------解决方案--------------------
看这里。root_兄给我的方法： http://topic.csdn.net/u/20081215/23/28f9ae30-2fa4-4b8d-8f84-710b4b5ddb6e.html
------解决方案--------------------
对，就是这个把流解压下再streamReader

探讨
网页经过deflate压缩的

System.IO.Compression.DeflateStream responseStream =new System.IO.Compression.DeflateStream( h

免责声明： 本文仅代表作者个人观点，与爱易网无关。其原创性以及文中陈述文字和内容未经本站证实，对本文以及其中全部或者部分内容、文字的真实性、完整性、及时性本站不作任何保证或承诺，请读者仅作参考，并请自行核实相关内容。

一个网页抓数据的有关问题，高难度

相关资料更多>

推荐阅读更多>