java截取html代码
写一个java类,实现对一个网站的html代码进行截取,java+jsp
因为我对这方面不是很懂,还望各位路过的达人有这方面的经验的能给于解决,最好是有一个完整的类,我实在是汗颜啊...
------解决方案-------------------- public static String readHtmlFile(String urlPath){
String htmlFile= " ";
try {
URL url = new URL(urlPath);
URLConnection urlConnection = url.openConnection();
urlConnection.setAllowUserInteraction(false);
// InputStream urlStream = url.openStream();
InputStream urlStream = urlConnection.getInputStream();//.openStream();
InputStreamReader sr = new InputStreamReader(urlStream, "GBK ");
int byteRead = 0;
char[] buffer = new char[8192];
while ((byteRead = sr.read(buffer, 0, 8192)) != -1) {
System.out.println(new String(buffer,0, byteRead));
htmlFile+=new String(buffer,0, byteRead);
}
} catch (
IOException e) {
System.out.println( "error : " + e.getMessage());
}
return htmlFile;
}
public static void main(String[] args){
String urlGk= "http://gaokao.h-edu.com/yx/yxjj.asp?schoolid=85 ";
String htmlContent=readHtmlFile(urlGk);
String r= " <span class=\ "fb14\ "> ([^\ "]+) </span> ";
Pattern s=Pattern.compile(r);
Matcher m=s.matcher(htmlContent);
System.out.println( "名称为 "+m.group(1));
}
我的,你自己改下 就可以了 用StringBuffer
------解决方案--------------------用html parser 就可以实现.
import org.htmlparser.Parser;
import org.htmlparser.util.NodeList;
import
org.htmlparser.util.ParserException;
public static String getCode(String urlStr) throws ParserException
{
Parser p = new Parser(urlStr);
NodeList list = p.parse(null);
String codeStr = list.toHtml();
System.out.println(codeStr);
return codeStr;
}