日期:2014-05-17  浏览次数:20581 次

100分求解。.net正则取内容,采集用的。

html内容如下

HTML code

  <head>
 <meta http-equiv="Content-Type" content="text/html; charset=GB18030">
  .
  .
  .
  .
</head>
<body>
 <script>
   var imgdata = {
                "queryEnc": "%D7%C0%C3%E6%B1%DA%D6%BD",
                "displayNum": 9921457,
                "bdIsClustered": "1",
                "listNum": 2000,
                "bdFmtDispNum": "9921457",
                "thumbURL": "http://t1.baidu.com/it/u=3917541605,2488848911&fm=0&gp=0.jpg",
                "bdSearchTime": "0.043"
                 },
                 {
                "queryEnc": "%D7%C0%C3%E6%B1%DA%D6%BD",
                "displayNum": 9921457,
                "bdIsClustered": "1",
                "listNum": 2000,
                "bdFmtDispNum": "9921457",
                "thumbURL": "http://t1.baidu.com/it/u=3917541605,2488848911&fm=0&gp=0.jpg",
                "bdSearchTime": "0.043"
                 }
      </script>
</body>



我想取得thumbURL, 怎么取?求解。正则我稀烂的。

------解决方案--------------------
C# code

            StreamReader reader = new StreamReader("c:\\1.txt",Encoding.Default);
            string source = reader.ReadToEnd();
            Regex reg = new Regex(@"(?is)(?<=""thumbURL""[^""]*?)""http:.*?""");
            MatchCollection mc = reg.Matches(source);
            foreach (Match m in mc)
            {
                MessageBox.Show(m.Value);
            }

------解决方案--------------------
string source = "{\r\n \"queryEnc\": \"%D7%C0%C3%E6%B1%DA%D6%BD\",\r\n \"displayNum\": 9921457,\r\n \"bdIsClustered\": \"1\",\r\n \"listNum\": 2000,\r\n \"bdFmtDispNum\": \"9921457\",\r\n \"thumbURL\": \"http://t1.baidu.com/it/u=3917541605,2488848911&fm=0&gp=0.jpg\",\r\n \"bdSearchTime\": \"0.043\"\r\n },\r\n {\r\n \"queryEnc\": \"%D7%C0%C3%E6%B1%DA%D6%BD\",\r\n \"displayNum\": 9921457,\r\n \"bdIsClustered\": \"1\",\r\n \"listNum\": 2000,\r\n \"bdFmtDispNum\": \"9921457\",\r\n \"thumbURL\": \"http://t1.baidu.com/it/u=3917541605,2488848911&fm=0&gp=0.jpg\",\r\n \"bdSearchTime\": \"0.043\"\r\n }\r\n";





var matches= Regex.Matches(source, @"(?is)(?<=thumbURL\"":\s*\"")[^""]+(?="")");
foreach(Match m in matches)
{
Response.WriteLine(m.Value);
}
------解决方案--------------------
(?i)(?<=thumbURL[^:]+?\s*?['""])[^'""]+
------解决方案--------------------

C# code
Regex reg = new Regex(@"(?i)(?<=""thumbURL""\s*:\s*"")[^""]*(?="")");
MatchCollection mc = reg.Matches(yourStr);
foreach (Match m in mc)
{
    richTextBox2.Text += m.Value + "\n&quo