日期:2014-05-17  浏览次数:20531 次

如何抓取在html页面中镶嵌的js数据?
如网页中有这样一段代码:
<script type="text/javascript">
  var latest_draw_result = { "red": ["14", "03", "01", "04", "08"], "blue": [], "310": [], "extra": [] };
  var latest_draw_phase = '201225727';
  var latest_draw_time = '2012-09-20 15:45:00';
  </script>

我如何能取得14 03 01 04 08
201225727
2012-09-20 15:45:00这些值?该怎么写呢如果用正则?万分感谢,在线等。

------解决方案--------------------
C# code
string pattern = @"(?i)(['""]?)red\1:\s*?\[(?:(['""]?)(?<red>[^'""\],]+?)\2[,,]?\s*?)+[\s\S]*?latest_draw_phase\s*?=\s*?'([^']+?)'[\s\S]*?latest_draw_time\s*?=\s*?'([^']+?)'";
            string tempStr = File.ReadAllText(@"C:\Users\M\Desktop\Test.txt", Encoding.GetEncoding("GB2312"));//读取文档
            Match _m=Regex.Match(tempStr,pattern);
            var result = new { 
                red=string.Join(" ",_m.Groups["red"].Captures.Cast<Capture>().Select(a=>a.Value)),
                latest_draw_phase=_m.Groups[3].Value,
                latest_draw_time = _m.Groups[4].Value,
            };
            /*
             *         latest_draw_phase    "201225727"    string
                    latest_draw_time    "2012-09-20 15:45:00"    string
                    red    "14   03   01   04   08"    string

             */