日期:2014-05-18  浏览次数:20707 次

java截取html代码
写一个java类,实现对一个网站的html代码进行截取,java+jsp  
因为我对这方面不是很懂,还望各位路过的达人有这方面的经验的能给于解决,最好是有一个完整的类,我实在是汗颜啊...

------解决方案--------------------
public static String readHtmlFile(String urlPath){
String htmlFile= " ";
try {
URL url = new URL(urlPath);
URLConnection urlConnection = url.openConnection();
urlConnection.setAllowUserInteraction(false);
// InputStream urlStream = url.openStream();
InputStream urlStream = urlConnection.getInputStream();//.openStream();
InputStreamReader sr = new InputStreamReader(urlStream, "GBK ");
int byteRead = 0;
char[] buffer = new char[8192];
while ((byteRead = sr.read(buffer, 0, 8192)) != -1) {
System.out.println(new String(buffer,0, byteRead));
htmlFile+=new String(buffer,0, byteRead);
}
} catch (IOException e) {
System.out.println( "error : " + e.getMessage());
}
return htmlFile;
}
public static void main(String[] args){
String urlGk= "http://gaokao.h-edu.com/yx/yxjj.asp?schoolid=85 ";
String htmlContent=readHtmlFile(urlGk);
String r= " <span class=\ "fb14\ "> ([^\ "]+) </span> ";
Pattern s=Pattern.compile(r);
Matcher m=s.matcher(htmlContent);
System.out.println( "名称为 "+m.group(1));
}
我的,你自己改下 就可以了 用StringBuffer
------解决方案--------------------
用html parser 就可以实现.
import org.htmlparser.Parser;
import org.htmlparser.util.NodeList;
import org.htmlparser.util.ParserException;

public static String getCode(String urlStr) throws ParserException
{
Parser p = new Parser(urlStr);
NodeList list = p.parse(null);
String codeStr = list.toHtml();
System.out.println(codeStr);
return codeStr;

}