日期:2014-05-17  浏览次数:20419 次

求一正则表达式,提取html指定标签内容
<div id="content">  
  <p>ddd</p>
  <div id="cc">ddd</div>
  <img src="yili120x60.gif" />
  <img src="120x60/omj120X60.gif" />
  </div>

提取  
<div id="content"> 这个div间的内容,即  
  <p>ddd</p>
  <div id="cc">ddd</div>
  <img src="yili120x60.gif" />
  <img src="120x60/omj120X60.gif" />

文章采集用的.研究了一天正则平衡组,搞不懂,求高手

------解决方案--------------------
C# code
            string str = @"<div class=""info"">aaaaaa<div id=""content"">   
  <p>ddd</p>
  <div id=""cc"">ddd</div>
  <img src=""yili120x60.gif"" />
  <img src=""120x60/omj120X60.gif"" />
  </div>bbbbb</div>";
            Regex reg = new Regex(@"(?is)<div[^>]*?id=""content"">((?:(?<Open><div[^>]*?>)|(?<-Open></div>)|.*?)*)(?(Open)(?!))</div>");
            Console.WriteLine(reg.Match(str).Groups[1].Value);