日期:2014-05-16  浏览次数:21175 次

查找HTML代码“超级链接”中含有域名的函数

需求:在一段HTML里寻找超级链接中(正文文本)域名。

?

例如HTML内容如下:

<div id="bdfs0" class="EC_im EC_fr EC_PP  EC_idea1017 "><a id="dfs0" class="EC_t EC_BL" onmousedown="return c({'fm':'im','title':this.innerHTML,'url':this.href,'p1':1})" href="http://www.baidu.com/baidu.php?url=zx6K00KyXs40Q86kRjYK5CXfuZ-Yhk3pujbXdk6f0Ex7QN3XNuAjwnSYSbeIYBOzZw2ezRQDFiGIKgR5iDXyeTCNEPl_Mr-LmZ3fzxORWyM-pF2xYZEQkz-2sFQ3htQuFy5nr9Y.0b_a7hQwfHjHKUQYI3LHRrgDk3q5iGyAp7BEI3ded7.U1Yk0ZDqknLw8UQBzVHiksKY5THPYUhz3_oRY_T0pyYqnW0k0ATqTZPbIj00IybqTv7VgLN-gvPEUAqz0ZKGujYkrfKWpyfqPHR0UgfqnH0kPsKopHYs0ZFY5iYk0ANGujY1PHTsg1cLP1IxnHf1P7tzrH6sg1ndPjm0mhbqnHmk0AdW5Hn3n10YPHDv0Z7spyfqn0Kkmv-b5H00ThIYmyTqn0KEIhsq0A7B5HKxn0K-ThTqn0KsTjYs0A4vTjYsQW0snj0snj0s0AdYTjYs0ZFJ5H00uANv5gKW0ZwdT1YkPj6YPHcsnWnsPHcznjTvPWRsrfKBUjYs0APzm1Y1PjT3rf" target="_blank"><font color=#CC0000>耐压测试仪</font>-上海精密仪器仪..</a><br><a class="EC_BL EC_desc" href="http://www.baidu.com/baidu.php?url=zx6K00KyXs40Q86kRjYK5CXfuZ-Yhk3pujbXdk6f0Ex7QN3XNuAjwnSYSbeIYBOzZw2ezRQDFiGIKgR5iDXyeTCNEPl_Mr-LmZ3fzxORWyM-pF2xYZEQkz-2sFQ3htQuFy5nr9Y.0b_a7hQwfHjHKUQYI3LHRrgDk3q5iGyAp7BEI3ded7.U1Yk0ZDqknLw8UQBzVHiksKY5THPYUhz3_oRY_T0pyYqnW0k0ATqTZPbIj00IybqTv7VgLN-gvPEUAqz0ZKGujYkrfKWpyfqPHR0UgfqnH0kPsKopHYs0ZFY5iYk0ANGujY1PHTsg1cLP1IxnHf1P7tzrH6sg1ndPjm0mhbqnHmk0AdW5Hn3n10YPHDv0Z7spyfqn0Kkmv-b5H00ThIYmyTqn0KEIhsq0A7B5HKxn0K-ThTqn0KsTjYs0A4vTjYsQW0snj0snj0s0AdYTjYs0ZFJ5H00uANv5gKW0ZwdT1YkPj6YPHcsnWnsPHcznjTvPWRsrfKBUjYs0APzm1Y1PjT3rf" target="_blank" id="bdfs0" onmousedown="return c({'fm':'im','title':this.innerHTML,'url':this.href,'p1':1,'rsv_ct':'d'})"><font size="-1" >咨询热线:021-65730171.接地电阻<font color=#CC0000>测试仪</font>/接地电阻表,绝缘电阻<font color=#CC0000>测试仪</font>/绝缘电阻表,</font><br><font size="-1" class="EC_url">www.canytec.com.cn</font></a><span class="icons EC_PP"><a style="cursor:pointer;text-decoration:none;" href="http://trust.baidu.com/vcard/?id=ef6b817356c74744c2c8142179f321ba97010019&url=zx6K00KyXs40Q86kRjYK5CXfuZ-Yhk3pujbXdk6f0Ex7QN3XNuAjwnSYSbeIYBOzZw2ezRQDFiGIKgR5iDXyeTCNEPl_Mr-LmZ3fzxORWyM-pF2xYZEQkz-2sFQ3htQuFy5nr9Y.7b_a7hQwfHjHKUQYI3LHRrgDk3q5iGyAp7BEI3ded0.U1Yk0ZDqknLw8UQBzVHiksKY5THPYUhz3_oRY_T0pyYqnW0k0ATqTZPbIj00IybqTv7VgLN-gvPEUAqz0ZKGujYkrfKWpyfqPHR0UgfqnH0kPsKopHYs0ZFY5iYk0ANGujY1PHTsg1cLP1IxnHf1P7tzrH6sg1ndPjm0mhbqnHmk0AdW5Hn3n10YPHDv0Z7spyfqn0Kkmv-b5H00ThIYmyTqn0KEIhsq0A7B5HKxn0K-ThTqn0KsTjYs0A4vTjYsQW0snj0snj0s0AdYTjYs0ZFJ5H00uANv5gKW0ZwdT1YkPj6YPHcsnWnsPHcznjTvPWRsrfKBUjYs0APzm1Y1PjT3rf&dataTime=51" target="_blank" onmousedown="return c({'title':this.innerHTML,'url':this.href,'fm':'im','rsvMt':'1017','p1':'1'});" class="c-icon icon-certify c-icon-v efc-cert" data-renzheng="{title: '上海精密仪器仪表有限公司:',favorite: {fm: 'im',rsvMt: '',p1: '1',url: 'http://i.baidu.com/myfavorite/set?'},appraise: {fm: 'im',rsvMt: '',p1: '1',url: 'http://trust.baidu.com/womc/comt/?'},report: {fm: 'im',rsvMt: '',p1: '1',url: 'http://baozhang.baidu.com/guarantee/accu/?'},identity: {a: {fm: 'im',rsvMt: '1017',p1:'1',url: 'http://trust.baidu.com/vcard/?id=ef6b817356c74744c2c8142179f321ba97010019&dataTime=51'}, img: '',text: '',credit: '5'}}" data-tip-limite="true" data-tooltips="bc"></a>
</span></div>

?

C++函数:

?

/*
 功能:查找一段HTML代码中超级链接<a><font>xxx.com</font></a>第一个出现的域名【例:xxx.com等】
      By Dewei 2013-10-22
 用法:
 #include <boost/regex.hpp>
 std::string str_out;
 find_first_domain(".....", &str_out);
 参数:原始HTML代码,输出找到的域名
 返回:找到返回true,否则返回false
 */
bool find_first_domain(const std::string &str_src, std::string &str_out)
{
	boost::regex expression("<a(.*?)href=\"([^\n\t\f\r \"]+?)\"([^>]*?)>(.*?)</a>",  boost::regex::icase|boost::regex::perl);