日期:2014-05-18  浏览次数:20421 次

如何用正则表达式提取网页内容
如何用正则表达式提取网页内容
代码如下:<div id="title" class="blog_tit_cont">
<strong>

<span >


<span>[转]</span>
为了练好口语,你敢不敢每天读一遍,坚持一个月? 
</span>


</strong>
<span id="pubTime" class="c_tx3">
<script type="text/javascript">
var pubtime = g_oBlogData.data.pubtime;
var pubDate = new Date(pubtime * 1000);
document.write(pubDate.getFullYear() + "." + (pubDate.getMonth() + 1) + "." + pubDate.getDate());
</script>
</span>
<span id="readNum" class="c_tx3"> </span>
<span id="quoteInfo" class="c_tx3"> </span>
</div>




如何提取div下的strong的内容?求详细源码

------解决方案--------------------
(?is)<strong>(?<strong>(.*))</strong>
------解决方案--------------------
try...

C# code
            Regex reg = new Regex(@"(?is)<div[^>]*>(?:(?!</?div).)*(<strong[^>]*>.*?</strong>)");
            MatchCollection mc = reg.Matches(yourStr);
            foreach (Match m in mc)
            {
                richTextBox2.Text += m.Groups[1].Value + "\n";
            }

------解决方案--------------------

C# code

 static void Main(string[] args)
        {
            string str = @"<div id=""title"" class=""blog_tit_cont"">
<strong>

<span >


<span>[转]</span>    
为了练好口语,你敢不敢每天读一遍,坚持一个月?  
</span>


</strong>
<span id=""pubTime"" class=""c_tx3"">
<script type=""text/javascript"">
var pubtime = g_oBlogData.data.pubtime;
var pubDate = new Date(pubtime * 1000);
document.write(pubDate.getFullYear() + ""."" + (pubDate.getMonth() + 1) + ""."" + pubDate.getDate());
</script>
</span>
<span id=""readNum"" class=""c_tx3""> </span>
<span id=""quoteInfo"" class=""c_tx3""> </span>
</div>
";


            Regex re = new Regex(@"(?is)(?<=<div id=""title""[^>]+>\s*<strong>).*?(?=</strong>)", RegexOptions.None);       
            Console.WriteLine(re.Match(str).Value);  //re.Match(str).Value就是你要的
            Console.ReadLine();
        }

------解决方案--------------------
C盘建一个1.txt
C# code

<div id="title" class="blog_tit_cont">
<strong>

<span >


<span>[转]</span>    
为了练好口语,你敢不敢每天读一遍,坚持一个月?  
</span>


</strong>
<span id="pubTime" class="c_tx3">
<script type="text/javascript">
var pubtime = g_oBlogData.data.pubtime;
var pubDate = new Date(pubtime * 1000);
document.write(pubDate.getFullYear() + "." + (pubDate.getMonth() + 1) + "." + pubDate.getDate());
</script>
</span>
<span id="readNum" class="c_tx3"> </span>
<span id="quoteInfo" class="c_tx3"> </span>
</div>