求正则表达式,清除网页标签中的内容
问题1:
<table border= "2 "> 转换成: <table>
问题2:
<td colspan= "2 " align= "center "> 转换成: <td colspan= "2 ">
问题3:
<col width=72> 转换成:空
三个问题一个问题30分。 其他10分给顶贴的人。
问题解决。就给分。
------解决方案--------------------1和3都可以用正则来做:
如
Regex regex = new Regex( "\ <table.*?\> ", RegexOptions.IgnoreCase );
regex.Replace( str, " <table> " );
------解决方案--------------------Regex regex = new Regex( "\ <table.*?\> ", RegexOptions.IgnoreCase );
==>
Regex regex = new Regex( @ "\ <table.*?\> ", RegexOptions.IgnoreCase );
------解决方案--------------------To:问题1,如果只是想去掉border属性,可以这样:
string str = " <table border=\ "2\ " align=\ "center\ "> ";
Match m = Regex.Match(str, " <\\s*table\\s+(border=\ "\\d*\ ")[\\s\\S]*> ");
str = str.Replace(m.Groups[1].Value, " ");
输出: <table align= "center ">
如果想去掉所有的属性可以这样:
string str = " <table border=\ "2\ " align=\ "center\ "> ";
Match m = Regex.Match(str, " <\\s*table\\s+(border=\ "\\d*\ "[\\s\\S]*)> ");
str = str.Replace(m.Groups[1].Value, " ");
输出: <table >
------解决方案--------------------To:问题二
string str = " <td colspan=\ "2\ " align=\ "center\ "> ";
Match m = Regex.Match(str, " <\\s*td[\\s\\S]*(align=\ "\\w*\ ")[\\s\\S]*> ");
str = str.Replace(m.Groups[1].Value, " ");
输出: <td colspan= "2 " >
------解决方案--------------------对于问题一,再改下,这样更好..
Match m = Regex.Match(str, " <\\s*table\\s+(border=\ "\\d*\ ")[\\s\\S]*> ");
-> >
Match m = Regex.Match(str, " <\\s*table[\\s\\S]*(border=\ "\\d*\ ")[\\s\\S]*> ");
Match m = Regex.Match(str, " <\\s*table\\s+(border=\ "\\d*\ "[\\s\\S]*)> ");
-> >
Match m = Regex.Match(str, " <\\s*table[\\s\\S]*(border=\ "\\d*\ "[\\s\\S]*)> ");
------解决方案--------------------string a=@ " <table border= " "2 " "> <table border= " "2 " "> ";
a=Regex.Replace(a,@ "(? <= <table)\s+?.+?(?=> ) ", " ",RegexOptions.IgnoreCase);
//a=Regex.Replace(a,@ " <table[^ <]+?> ", " <table> ",RegexOptions.IgnoreCase);
a=@ " <td colspan= " "2 " " align= " "center " "> <td colspan= " "2 " " align= " "center " "> ";
a=Regex.Replace(a,@ "(? <= <td\s+?\S+?)\s+?.+?(?=> ) ", " ",RegexOptions.IgnoreCase);
//a=Regex.Replace(a,@ "(? <= <td\s+?colspan=\S+?)\s+?.+?(?=> ) ", " ",RegexOptions.IgnoreCase);
a=@ " <col width=72> <col width=72> ";
a=Regex.Replace(a,@ " <col[^ <]+?> ", " ",RegexOptions.IgnoreCase);