日期:2014-05-20  浏览次数:20997 次

求正则表达式,清除网页标签中的内容
问题1:
<table   border= "2 ">   转换成:   <table>
问题2:
<td   colspan= "2 "   align= "center ">   转换成: <td   colspan= "2 ">
问题3:
<col   width=72>   转换成:空

三个问题一个问题30分。   其他10分给顶贴的人。
问题解决。就给分。

------解决方案--------------------
1和3都可以用正则来做:


Regex regex = new Regex( "\ <table.*?\> ", RegexOptions.IgnoreCase );
regex.Replace( str, " <table> " );

------解决方案--------------------
Regex regex = new Regex( "\ <table.*?\> ", RegexOptions.IgnoreCase );
==>
Regex regex = new Regex( @ "\ <table.*?\> ", RegexOptions.IgnoreCase );

------解决方案--------------------
To:问题1,如果只是想去掉border属性,可以这样:

string str = " <table border=\ "2\ " align=\ "center\ "> ";
Match m = Regex.Match(str, " <\\s*table\\s+(border=\ "\\d*\ ")[\\s\\S]*> ");
str = str.Replace(m.Groups[1].Value, " ");
输出: <table align= "center ">


如果想去掉所有的属性可以这样:
string str = " <table border=\ "2\ " align=\ "center\ "> ";
Match m = Regex.Match(str, " <\\s*table\\s+(border=\ "\\d*\ "[\\s\\S]*)> ");
str = str.Replace(m.Groups[1].Value, " ");

输出: <table >
------解决方案--------------------
To:问题二

string str = " <td colspan=\ "2\ " align=\ "center\ "> ";
Match m = Regex.Match(str, " <\\s*td[\\s\\S]*(align=\ "\\w*\ ")[\\s\\S]*> ");
str = str.Replace(m.Groups[1].Value, " ");

输出: <td colspan= "2 " >
------解决方案--------------------
对于问题一,再改下,这样更好..

Match m = Regex.Match(str, " <\\s*table\\s+(border=\ "\\d*\ ")[\\s\\S]*> ");
-> >
Match m = Regex.Match(str, " <\\s*table[\\s\\S]*(border=\ "\\d*\ ")[\\s\\S]*> ");


Match m = Regex.Match(str, " <\\s*table\\s+(border=\ "\\d*\ "[\\s\\S]*)> ");
-> >
Match m = Regex.Match(str, " <\\s*table[\\s\\S]*(border=\ "\\d*\ "[\\s\\S]*)> ");


------解决方案--------------------
string a=@ " <table border= " "2 " "> <table border= " "2 " "> ";
a=Regex.Replace(a,@ "(? <= <table)\s+?.+?(?=> ) ", " ",RegexOptions.IgnoreCase);
//a=Regex.Replace(a,@ " <table[^ <]+?> ", " <table> ",RegexOptions.IgnoreCase);
a=@ " <td colspan= " "2 " " align= " "center " "> <td colspan= " "2 " " align= " "center " "> ";
a=Regex.Replace(a,@ "(? <= <td\s+?\S+?)\s+?.+?(?=> ) ", " ",RegexOptions.IgnoreCase);
//a=Regex.Replace(a,@ "(? <= <td\s+?colspan=\S+?)\s+?.+?(?=> ) ", " ",RegexOptions.IgnoreCase);
a=@ " <col width=72> <col width=72> ";
a=Regex.Replace(a,@ " <col[^ <]+?> ", " ",RegexOptions.IgnoreCase);