正则表达式采撷-ASP.NET教程-爱易网页

正则表达式采撷

日期：2014-05-17　浏览次数：20562 次

正则表达式采集
大家好，我现在需要作一个简单的采集器
主要是匹配源代码中的超连接
这是一个较烦的。

HTML code


<a class="costdown" href="http://order.xiaomi.com/static/re" onclick="_gaq.push(['_trackEvent', '首页广告点击', '官翻版购买通道']);">官翻版购买通道</a>

-----我只要匹配其中有 “官翻版购买通道” 和链接“http://order.xiaomi.com/static/re”

HTML code


<a style="margin-left:20px" href="http://www.xiaomi.com/about" >关于小米</a>

匹配：关于小米 --http://www.xiaomi.com/about

主要是匹配关键词与连接的地址。怎么实现

------解决方案--------------------
(?i)<a\b[^>]*?href=(['"]?)(?<href>[^'"]+)\1[^>]*?>(?<txt>[^<>]+)</a>

取Groups["href"]和Groups["txt"] 就是你想要的
------解决方案--------------------

C# code


 string input = @"<a class=""costdown"" href=""http://order.xiaomi.com/static/re"" onclick=""_gaq.push(['_trackEvent', '首页广告点击', '官翻版购买通道']);"">官翻版购买通道</a>  
<a style=""margin-left:20px"" href=""http://www.xiaomi.com/about"" >关于小米</a>
"; 

            Dictionary<string, string> dic = new Dictionary<string, string>();
            foreach (Match m in Regex.Matches(input, @"(?is)<a\b[^>]*?href=([""']?)([^""']*?)\1[^>]*?>(.*?)</a>"))
            {
                dic.Add(m.Groups[2].Value, m.Groups[3].Value);
            }

            foreach (var m in dic)
            {
                Console.WriteLine(m.Key + "\t" + m.Value);
            }
/*
http://order.xiaomi.com/static/re       官翻版购买通道
http://www.xiaomi.com/about            关于小米
*/

免责声明： 本文仅代表作者个人观点，与爱易网无关。其原创性以及文中陈述文字和内容未经本站证实，对本文以及其中全部或者部分内容、文字的真实性、完整性、及时性本站不作任何保证或承诺，请读者仅作参考，并请自行核实相关内容。

正则表达式采撷

相关资料更多>

推荐阅读更多>