抓取页面数据内容并截取正则太烂特来-ASP.NET教程-爱易网页

抓取页面数据内容并截取正则太烂特来

日期：2014-05-17　浏览次数：20625 次

抓取页面数据内容并截取正则太烂特来求助!
正则太烂特来求助抓取了一个页面的内容，如这是他的数据列表我只要这块内容的前10条，还有就是下面红色的多少条相同新闻的链接内容也不需要


? 
<a href="http://www.yangtse.com/system/2013/02/26/016368991.shtml" mon="a=5&pn=1" target="_blank">海地首都附近发生3.5级地震 居民楼摇晃(图) </a> 
扬子晚报网 2013-02-26 10:11:55 
<a href="/ns?word=%B5%D8%D5%F0+cont:4172744296|255977833|3388618498|1142884381&same=34&cl=1&tn=newstitle&rn=30" class="more_link">
34条相同新闻>></a> 

? 
<a href="http://www.chinadaily.com.cn/hqgj/jryw/2013-02-26/content_8349553.html" mon="a=5&pn=2" target="_blank">内蒙古自治区实现4级以上地震3分钟内速报</a> 中国日报 2013-02-26 08:57:00 
<a href="/ns?word=%B5%D8%D5%F0+cont:3043244100&same=6&cl=1&tn=newstitle&rn=30" class="more_link">6条相同新闻>></a> 

......
......
......

------解决方案--------------------

 string str = GetHtml(url);

            var list = Regex.Match(str, @"(?is)<p class=""res"">(.*?(?<t><a[^>]*>.*?</a>.*?)\s*<a)*").Groups["t"].Captures.OfType<Capture>().Select(t => t.Value).ToList();

            if (list.Count > 10)

                list = list.Take(10).ToList();

------解决方案--------------------
这样呢？



  string html = File.ReadAllText(@"C:\1.txt", Encoding.GetEncoding("GB2312"));

        List<string> list = new List<string>();

        int i = 0;

        foreach (Match m in Regex.Matches(html, @"(?is)<span[^>]*?>.*?(<a\s*href=([""']?)[^""]*?\2\s*mon=""a=\d+&pn=\d+""[^>]*?>.*?</a>.*?<font\s*class=g\s*size=\d+[^>]*?>.*?</font>).*?</span>"))

        {

            if (i == 10)

            {

                break;

            }

            list.Add(m.Groups[1].Value);

            i++;

        }

        list.ForEach(x => Response.Write(x + "</br>"));

免责声明： 本文仅代表作者个人观点，与爱易网无关。其原创性以及文中陈述文字和内容未经本站证实，对本文以及其中全部或者部分内容、文字的真实性、完整性、及时性本站不作任何保证或承诺，请读者仅作参考，并请自行核实相关内容。

抓取页面数据内容并截取正则太烂特来

相关资料更多>

推荐阅读更多>