日期:2014-05-19  浏览次数:21234 次

怎样从HTML网页中提取纯文本啊,c#的
不好意思啊,是新手啦,搜到一个VB的,想要一个C#的,希望达人给一些指点,谢谢了

------解决方案--------------------
搜 " 小偷程序 "
------解决方案--------------------
private void DownFile(string filename)
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(new Uri(filename));
ServicePointManager.Expect100Continue = false;
HttpWebResponse response = null;
try
{
response = (HttpWebResponse)request.GetResponse();
}
catch (WebException exception)
{
if (exception.Status == WebExceptionStatus.ProtocolError)
{
response = (HttpWebResponse)exception.Response;
}
else if (exception.Status == WebExceptionStatus.ConnectFailure)
{
//
}
else
{
//MessageBox.Show(exception.ToString());
}
}

Stream responseStream = response.GetResponseStream();
StreamReader sr = new StreamReader(responseStream, Encoding.Default);
string body = sr.ReadToEnd();
sr.Close();
responseStream.Close();
response.Close();

StreamWriter sw = new StreamWriter(SaveFileName(), false, Encoding.Default);
sw.Write(body);
sw.Close();
------解决方案--------------------
楼上基本上是那个意思吧。
------解决方案--------------------
如果HTML在本地可以直接读。楼上的应该也行。另提供一点思路XMLHTTP也可以,不过每次读大概有3秒的延时