关于获取其他网页数据并用正则表达式匹配的有关问题-Java教程-爱易网页

关于获取其他网页数据并用正则表达式匹配的有关问题

日期：2014-05-17　浏览次数：20783 次

关于获取其他网页数据并用正则表达式匹配的问题
思路是这样的：
用文件流读取别人网页并转存进一个字符串，再用正则表达式匹配。匹配一直匹配不上，正则表达式没问题，我用工具测试了。是不是编码问题呢？他网页的编码也是gb2312啊。高手指点指点。在线等。
代码：

import java.io.*;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/*
* 此为网络下载器，可根据输入的地址下载文件。
* 根据对字符串的处理，可以实现过滤文件及抓取网络文件内容。
* http://www.ip138.com/ips.asp?ip=www.google.cn
* 此地址用于查询制定ip/域名的物理地址网页
*/
public class example {

public static void main(String[] args){
try{
showurl("http://www.ip138.com/ips.asp?ip=www.baidu.com");
//showurl("http://cdn1-87.projectplaylist.com/e1/static3/mp3/850973.mp3");
}
catch(Exception e){
e.printStackTrace();
}
}

public static void showurl(String str) throws MalformedURLException, IOException{
URL url=new URL(str);
int size=0;
String str2=null;
String result;
HttpURLConnection httpUrl=(HttpURLConnection)url.openConnection();
httpUrl.connect();
//FileOutputStream fos=new FileOutputStream("D://a.html");
Pattern pattern=Pattern.compile("<ul class=\"ul1\"><li>[\u4e00-\u9fa5]*：[\u4e00-\u9fa5]*");
BufferedInputStream bis=new BufferedInputStream(httpUrl.getInputStream());
byte[] buf=new byte[80];
while((size=bis.read(buf))!=-1){
// fos.write(buf,0,size);
String str1=new String(buf);
str2+=str1;
// System.out.println(str1);
}
bis.close();
httpUrl.disconnect();
Matcher matcher=pattern.matcher(str2);
//System.out.println(str2);
if(matcher.lookingAt()){
String str3=matcher.group(0);
String[] str4=str3.split("：");
result=str4[str4.length-1];
System.out.println("结果为:"+result);
//System.out.println("网页"+str+"下载完成，放在D://a.html");
//fos.close();
}
else{
System.out.println("匹配未成功");
}
}

}

大家指点指点代码无错误可直接copy试试在此谢过。在线等

------解决方案--------------------
呵呵，不用那么灰心啊，牛奶会有的，面包也会有的。

免责声明： 本文仅代表作者个人观点，与爱易网无关。其原创性以及文中陈述文字和内容未经本站证实，对本文以及其中全部或者部分内容、文字的真实性、完整性、及时性本站不作任何保证或承诺，请读者仅作参考，并请自行核实相关内容。

关于获取其他网页数据并用正则表达式匹配的有关问题

相关资料更多>

推荐阅读更多>