日期:2014-05-17  浏览次数:20642 次

抓取网页指定数据
HTML code

 <tr id="tr_domains_16969543" style="cursor:auto;" onclick="selRow(this);" onmouseover="tr_Mouseover(this)" onmouseout="tr_Mouseout(this)">
      <td class="domainname" >
               <div class="domainurl">
                 <a href="http://whois.chinaz.com/30n.net" id="domain_1" target="_blank" title="查看">a</a>
               </div>
      </td>
                                                <td>b</td>
                                                <td>c</td>
                                                <td>d</td>
                                                <td>d</td>
</tr>
 <tr id="tr_domains_16969543" style="cursor:auto;" onclick="selRow(this);" onmouseover="tr_Mouseover(this)" onmouseout="tr_Mouseout(this)">
      <td class="domainname" >
               <div class="domainurl">
                 <a href="http://whois.chinaz.com/30n.net" id="domain_1" target="_blank" title="查看">a</a>
               </div>
      </td>
                                                <td>b</td>
                                                <td>c</td>
                                                <td>d</td>
                                                <td>d</td>
</tr>
 <tr id="tr_domains_16969543" style="cursor:auto;" onclick="selRow(this);" onmouseover="tr_Mouseover(this)" onmouseout="tr_Mouseout(this)">
      <td class="domainname" >
               <div class="domainurl">
                 <a href="http://whois.chinaz.com/30n.net" id="domain_1" target="_blank" title="查看">a</a>
               </div>
      </td>
                                                <td>b</td>
                                                <td>c</td>
                                                <td>d</td>
                                                <td>e</td>
</tr>
 <tr id="tr_domains_16969543" style="cursor:auto;" onclick="selRow(this);" onmouseover="tr_Mouseover(this)" onmouseout="tr_Mouseout(this)">
      <td class="domainname" >
               <div class="domainurl">
                 <a href="http://whois.chinaz.com/30n.net" id="domain_1" target="_blank" title="查看">a</a>
               </div>
      </td>
                                                <td>b</td>
                                                <td>c</td>
                                                <td>d</td>
                                                <td>e</td>
</tr>



如何把td内的每组数据分别提取出来?

------解决方案--------------------
C# code
string url = "http://del.chinaz.com/";

            WebRequest request = WebRequest.Create(url); //请求url
            WebResponse response = request.GetResponse(); //获取url数据

            StreamReader reader = new StreamReader(response.GetResponseStream(), Encoding.GetEncoding("utf-8"));
            string tempStr = reader.ReadToEnd();

            string pattern = @"(?i)<tr[^>]*?id=(['""]?)tr_domains[^'""]*?\1[^>]*?>[\s\S]*?<a[^>]*?id=(['"&