怎么读取html文件的内容-Java教程-爱易网页

怎么读取html文件的内容

日期：2014-05-20　浏览次数：20848 次

如何读取html文件的内容？
在读取的时候，不能将htm标签读取出来。
如： <h1> hello,world! </h1>
读取出 hello,world!

我是想先判断每行中 <和> ，然后将 <和> 中的内容跳过不读取，但是用skip好象不好怎么跳，并且这种方法也有很多其它的缺陷。

请哪位给个大概读取html文件的内容思路。

------解决方案--------------------
孙鑫老师的,你看看有用没得?
import java.net.*;
import java.io.*;
import java.util.*;
//import java.lang.*;
public class GetGoogle{
public static void main(String[] args)throws Exception{
System.out.println( "获取日文页面 ");
getContentByLanguage( "ja ");
System.out.println( "\n ");
System.out.println( "获取中文页面 ");
getContentByLanguage( "zh-cn ");
System.out.println( "\n ");
}
public static void getContentByLanguage(String country) throws Exception{
URL urlGoogle=new URL( "http://www.google.cn ");
HttpURLConnection googleConnection=(HttpURLConnection)urlGoogle.openConnection();
googleConnection.setRequestProperty( "Accept-Language ",country);

Map requests=googleConnection.getRequestProperties();
Set reqFields=requests.keySet();
Iterator itrReq=reqFields.iterator();
while(itrReq.hasNext()){
String Field=(String)itrReq.next();
System.out.println(Field + ": " +googleConnection.getRequestProperty(Field));

}
googleConnection.connect();
Map responses=googleConnection.getHeaderFields();
Set resFields=responses.keySet();
Iterator itrRes=resFields.iterator();
while(itrRes.hasNext()){
String Field=(String)itrRes.next();
System.out.println(Field + ": " +googleConnection.getHeaderField(Field));
}
InputStream iss=googleConnection.getInputStream();
BufferedReader br=new BufferedReader(new InputStreamReader(iss));
String strLine=null;
while((strLine=br.readLine())!=null){
System.out.println(strLine);
}
br.close();
googleConnection.disconnect();
}
}

免责声明： 本文仅代表作者个人观点，与爱易网无关。其原创性以及文中陈述文字和内容未经本站证实，对本文以及其中全部或者部分内容、文字的真实性、完整性、及时性本站不作任何保证或承诺，请读者仅作参考，并请自行核实相关内容。

相关资料更多>

晓得错在哪,不知咋改

图片提交按钮，虽然submit和reset都能变成图片了但是submit和reset的功能也没了,该怎么解决

能不能通过编写脚本降低jvm内存占用？解决方案

div中加载另一个网页的有关问题

随机存取1000个0到9的数，从中找出12345第一次连续出现的位置,该如何处理

Java字节流乱码有关问题

今日刚学缓冲区，在自己编写MyBufferedReader 时，在使用myReadLine()时，总是不能督导最后一行，求大神们指点

html button解决方案

《代码大全其次版》书中的小疑问

怎么读取html文件的内容

相关资料更多>

推荐阅读更多>