求正则获取页面中的链接的有关问题-C#教程-爱易网页

求正则获取页面中的链接的有关问题

日期：2014-05-18　浏览次数：21023 次

求正则获取页面中的链接的问题！
我想获取页面源代码中的链接地址
如：www.baidu.com中获取链接地址

<a onClick="this.style.behavior='url(#default#homepage)';this.setHomePage('http://www.baidu.com')" href=http://utility.baidu.com/traf/click.php?id=215&url=http://www.baidu.com>把百度设为首页</a><a href=http://jingjia.baidu.com>企业推广</a> | <a href=http://top.baidu.com>搜索风云榜</a> | <a href=/home.html>关于百度</a> | <a href=http://ir.baidu.com>About Baidu</a>©2008 Baidu <a href=http://www.baidu.com/duty>使用百度前必读</a> <a href=http://www.miibeian.gov.cn target=_blank>京ICP证030173号</a> <a href=http://www.hd315.gov.cn/beian/view.asp?bianhao=010202001092500412><img src=http://gimg.baidu.com/img/gs.gif></a>

得到
http://utility.baidu.com/traf/click.php?id=215&url=http://www.baidu.com
http://jingjia.baidu.com
http://top.baidu.com
/home.html
等地址！也就是href=后的结果

我用

C# code


private string getUrlCode(string StrContent)
{
string urlCode = null;
Regex re = new Regex(@"<a\s+href\s*=\s*('(?<href>[^']*)'|""(?<href>[^""]*)""|(?<href>[\S>]*))[^>]*>.*?<u>(?<link>[^<]+)</u>.*?</a>", RegexOptions.IgnoreCase | RegexOptions.Singleline);

foreach (Match m in re.Matches(s))
{
    urlCode = urlCode + m.Groups["href"].Value+ "\r\n";
}

return urlCode;
}

没效果！

------解决方案--------------------

C# code

<a.+?href=(?<href>[^>]+)>

------解决方案--------------------
C# code
(?<=href=).*?(?=>|\s)

------解决方案--------------------
try:
C# codeMatchCollection mc = Regex.Matches(要匹配的字符串,"href=(?<href>[^>]*)",RegexOptions.IgnoreCase);
foreach ( Match match in mc )
   Response.Write( match.Groups["href"].Value + "<br />" );

免责声明： 本文仅代表作者个人观点，与爱易网无关。其原创性以及文中陈述文字和内容未经本站证实，对本文以及其中全部或者部分内容、文字的真实性、完整性、及时性本站不作任何保证或承诺，请读者仅作参考，并请自行核实相关内容。

求正则获取页面中的链接的有关问题

相关资料更多>

推荐阅读更多>