日期:2014-05-18  浏览次数:20931 次

Help 网页抓取源码----神奇网址求抓
抓取不到源码的网址
http://www1.macys.com/catalog/product/index.ognc?ID=596761

用HttpWebRequest死活抓不到源码,报重定向太多。监视了下cookie。加了一堆还是没解决,求教有人能抓的到么。


-------------------------
ps:同样的产品页:比如 http://www1.macys.com/catalog/product/index.ognc?ID=603770 抓取就没问题。一样的代码抓取上面的网址就不行。网上能搜到的代码一一试验了下,均不行。没一个能抓到上面网址源码
-------------------------

测试方法代码:
C# code

private static string getContent(string Url)
        {
            string content = "";
            try
            {
                HttpWebRequest wreq = (HttpWebRequest)WebRequest.Create(Url);
                wreq.MaximumAutomaticRedirections = 4;
                wreq.MaximumResponseHeadersLength = 4;
                //wreq.Credentials = System.Net.CredentialCache.DefaultCredentials;
                //wreq.Referer = "http://www.macys.com";  
                //wreq.Headers.Add(HttpRequestHeader.Cookie, "macys_online=4416704358; shippingCountry=US; currency=USD;");
                wreq.Method = "Get";
                wreq.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
                wreq.UserAgent = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/534.30 (KHTML, like Gecko) Chrome/12.0.742.122 Safari/534.30";
                CookieContainer cookieCon = new CookieContainer();
                //CookieCollection cc = new CookieCollection();
                //cc.Add(new System.Net.Cookie("currency", "USD", "/", "macys.com"));
                //cc.Add(new System.Net.Cookie("PPP", "24", "/", "macys.com"));
                //cc.Add(new System.Net.Cookie("SignedIn", "0", "/", "macys.com"));
                //cc.Add(new System.Net.Cookie("shippingCountry", "US", "/", "macys.com"));
                //cookieCon.Add(cc);
                wreq.CookieContainer = cookieCon;
                HttpWebResponse wresp = (HttpWebResponse)wreq.GetResponse();
                StreamReader sr = new StreamReader(wresp.GetResponseStream()); 
                content = sr.ReadToEnd();
            }
            catch (Exception ex)
            {
                content = ex.Message;
            }
            return content;
        }



------解决方案--------------------
你直接抓
跳转过的不行吗
http://www1.macys.com/shop/product/treasured-hearts-diamond-ring-sterling-silver-black-white-diamond-heart-ring-1-4-ct.-t.w.?ID=596761&intnl=true&intnl=true
------解决方案--------------------
这个页面跳转的话,应该是有一个refer的url,你给这个url加上在试试!
------解决方案--------------------
首先,你直接在浏览器里面输入
http://www1.macys.com/catalog/product/index.ognc?ID=596761
能得到源代码吗?得到的是你期望的结果吗

注意是直接输入。
有的页面是需要从上一级页面中点击进来的,否则是不行的
------解决方案--------------------
好像可以禁止查看源码!
------解决方案--------------------
[url=http://blog.csdn.net/yysyangyangyangshan/article/details/6661886]试试这个类,要抓取的url作为参数初始化/[url]
------解决方案--------------------
试试这个类,要抓取的url作为参数初始化 

------解决方案--------------------
那个网址应该有个入口的,不能直接进去啊,都跳转了,如果你知道那个网址是从哪个页面进去的,贴出来。。。
或者自己试一下,将那个网址作为refer加到代码里面,看看行不行
------解决方案--------------------
要是人家用个flash来实现页面,你偏要抓html,也是枉然。