日期:2014-05-18  浏览次数:20687 次

正则表达式从网页html代码里获取数据
<div class="ry_box">
  <dl class="ry_boxleft">
  <p class="ry_picbd"><a href="http://www.m1905.com/yx/film/c1f509553.html"><img src="http://image11.m1905.cn/uploadfile/2012/0406/thumb_1_98_137_20120406050041947.jpg" alt="我是中国人" title="我是中国人" /></a></p>
  </dl>
  <dl class="line-h24 ry_boxright">
  <p class="f14 color_blue h24px">
  <span class="fl"><a href="http://www.m1905.com/yx/film/c1f509553.html" title="我是中国人">我是中国人</a></span>
  <span class="sm_star1 fl ml05 mt02"><span class="sm_star2" style="width:58%;"></span><span class="star_cont" style="display:none">5.8分</span></span>
  </p>
  <p class="mt02"><span class="color_gray1">主演:</span><a href="http://www.m1905.com/mdb/film/list/starring-2994182/" target="_blank" title="查看该演员参加的影片">李乾铭</a> / <a href="http://www.m1905.com/mdb/film/list/starring-1014/" target="_blank" title="查看该演员参加的影片">颜丹晨</a> / <a href="http://www.m1905.com/mdb/film/list/starring-2991308/" target="_blank" title="查看该演员参加的影片">张岩</a></p>
  <p><span class="color_gray1">类型:</span><a href="http://www.m1905.com/mdb/film/list/mtype-15/" target="_blank">剧情</a> <a href="http://www.m1905.com/mdb/film/list/mtype-30/" target="_blank">战争</a> </p>
  <p><span class="color_gray1">上映时间:</span>2012-04-19</p>
  <p class="time_net mt10 color_blue"><a href="http://www.m1905.com/yx/film/c1f509553.html">放映时间表</a></p>
  </dl>
  </div>





用正则表达式获取 <dl class="line-h24 ry_boxright">
里边a标签上 "http://www.m1905.com/yx/film/c1f509553.html" 以及title


请假各位大侠。。。。

------解决方案--------------------
1.txt就是你贴的html文本
C# code

            string input = File.ReadAllText(@"C:\1.txt", Encoding.GetEncoding("gb2312"));
            Dictionary<string, string> dic = new Dictionary<string, string>();
            MatchCollection mc = Regex.Matches(input, @"(?is)<dl\s*class=""line-h24 ry_boxright"">\s*<p [^>]*>\s*<span[^>]*><a\s*href=""([^""]*)""\s*title=""([^""]*)"">.*?</a></span>.*?\s*</p>");
            foreach (Match mx in mc )
            {
                Console.WriteLine(mx.Groups[1].Value);//http://www.m1905.com/yx/film/c1f509553.html
                Console.WriteLine(mx.Groups[2].Value); //我是中国人
                dic.Add(mx.Groups[1].Value, mx.Groups[2].Value);
            }

------解决方案--------------------
探讨

如题:
如果只取a标签上 "http://www.m1905.com/yx/film/c1f509553.html" c1f509553以及title这个正则表达式@"(?is)<dl\s*class=""line-h24 ry_boxright"">\s*<p [^>]*>\s*<span[^>]*><a\s*href=""([^&quo