日期:2014-05-20  浏览次数:20899 次

从页面中找到目标链接
大家好,下面是一段html代码,我需要找到包含“下一页”字样的那个链接http://www.tianya.cn/publicforum/content/free/1/2634838.shtml

请问如何用java自动查找呢?
正则表达式?

<div class="pages" id="pageDivTop">
 <em class="current">1</em>
 <a href="http://www.tianya.cn/publicforum/content/free/1/2634838.shtml">2</a>
 <a href="http://www.tianya.cn/publicforum/content/free/1/2634898.shtml">3</a>
 <a href="http://www.tianya.cn/publicforum/content/free/1/2634952.shtml">4</a>
 <a href="http://www.tianya.cn/publicforum/content/free/1/2635106.shtml">5</a>
 <a href="http://www.tianya.cn/publicforum/content/free/1/2635431.shtml">6</a>
 <a href="http://www.tianya.cn/publicforum/content/free/1/2635681.shtml">7</a>
 <a href="http://www.tianya.cn/publicforum/content/free/1/2635760.shtml">8</a>
 <a href="http://www.tianya.cn/publicforum/content/free/1/2635860.shtml">9</a>
 <a href="http://www.tianya.cn/publicforum/content/free/1/2636064.shtml">10</a>
 <a href="http://www.tianya.cn/publicforum/content/free/1/2634838.shtml">下一页</a>
 <a href="http://www.tianya.cn/publicforum/content/free/1/2644117.shtml">末页</a>
 <a href="#adsp_content_replybox_frame_1">回复此贴</a>
 <span>共41页</span>
 <form method="post" action="http://www.tianya.cn/new/publicforum/content.asp?stritem=free&#8706;=0&amp;flag=1&amp;idarticle=270643">
  <table>
  <tbody>
  <tr>
  <td><span>直接到</span><input name="idarticlekey" type="text" id="idarticlekey" class="pagenum" value="" size="4" /><span>页</span><input type="submit" name="button2" id="button2" value="确定" class="pagego" /><input type="hidden" name="idArticleslist" value="2634640,2634838,2634898,2634952,2635106,2635431,2635681,2635760,2635860,2636064,2636246,2636396,2636599,2636646,2636764,2637151,2637363,2637438,2637460,2637561,2637594,2637634,2637674,2637790,2637867,2637911,2638027,2638089,2638183,2638235,2638305,2638355,2638416,2638506,2638679,2638846,2639128,2639688,2640333,2642212,2644117," /><input type="hidden" id="firstidarticle" name="firstidarticle" value="2634640" /><input type="hidden" name="pageflag" value="1" /> </td>
  </tr>
  </tbody>
  </table>
 </form> 
</div>

------解决方案--------------------
正则:
Java code

    public static void getNext() {
        String html = "<a href=\"http://www.tianya.cn/publicforum/content/free/1/2636064.shtml\">10</a>\n"
                + "<a id =\"1\" href=\"http://www.tianya.cn/publicforum/content/free/1/2634838.shtml\">下一页</a>\n"
                + "<a href=\"http://www.tianya.cn/publicforum/content/free/1/2634838.shtml\" name=\"a\">下一页</a>\n"
                + "<a href=\"http://www.tianya.cn/publicforum/content/free/1/2644117.shtml\">末页</a>\n"
                + "<a href=\"#adsp_content_replybox_frame_1\">回复此贴</a>\n" + "<span>共41页</span>\n";
        Pattern pattern = Pattern.compile("<a\\s+.*?href=\"([^\"]+)\"[^>]*>下一页</a>");
        Matcher matcher = pattern.matcher(html);
        while (matcher.find()) {
            System.out.println(matcher.group(1));
        }
    }