新闻采集回来的数据怎么去掉javascript
Function NoHtml(str)
Dim re
str=(str)
Set re=new RegExp
re.IgnoreCase =True
re.Global=True
re.Pattern= "(\ <.[^\ <]*\> ) "
str=re.Replace(str, " ")
re.Pattern= "(\ <\/[^\ <]*\> ) "
str=re.Replace(str, " ")
NoHtml=str
Set re=Nothing
End Function
只能去了HTML标志
如
<script> alert( "我想去掉这里 "); </scrip>
结果
alert( "我想去掉这里 ");还没有被去掉
还有编码
Function getHTTPPage(url)
Dim Http
Set Http=Server.CreateObject( "MSXML2.XMLHTTP ")
Http.open "GET ",url,False
Http.send()
If Http.readystate <> 4 Then Exit Function
getHTTPPage=bytesToBSTR(Http.responseBody, "gb2312 ")
Set http=Nothing
If Err.number <> 0 Then Err.Clear
End Function
这个只能采gb2312的编码,utf-8的就乱码了
怎么解决呢#
------解决方案--------------------VBS应该有一些字符串函数可以用吧(我不太清楚):
把整个的HTML文本看作一个字符串,然后用字符串函数查找到第一个子串 " <script> "(转小写或大写)的位置,再查找到 " </script> "的位置,把两个位置之间的子串删掉,然后把清理后的字符串存为HTML文件或你想要的文件格式(fileObject可以做吧,我也不是很清楚)
^_^
------解决方案-------------------- <textarea id=textarea1>
wwwww <script> alert( "我想去掉这里 ");alert( "我想去掉这里 ");alert( "我想去掉这里 "); </script>
这个也去不掉呀..
</textarea>
<script>
var str=textarea1.value;
var re=/ <script> [\s\S]*? <\/script> /g;
re.test(str)
str=str.replace(re, " ");
alert(str);
</script>
------解决方案--------------------Function RemoveHTML(strText)
Dim TAGLIST
TAGLIST = ";!--;!DOCTYPE;A;ACRONYM;ADDRESS;APPLET;AREA;B;BASE;BASEFONT; " &_
"BGSOUND;BIG;BLOCKQUOTE;BODY;BR;BUTTON;CAPTION;CENTER;CITE;CODE; " &_
"COL;COLGROUP;COMMENT;DD;DEL;DFN;DIR;DIV;DL;DT;EM;EMBED;FIELDSET; " &_
"FONT;FORM;FRAME;FRAMESET;HEAD;H1;H2;H3;H4;H5;H6;HR;HTML;I;IFRAME;IMG; " &_
"INPUT;INS;ISINDEX;KBD;LABEL;LAYER;LAGEND;LI;LINK;LISTING;MAP;MARQUEE; " &_
"MENU;META;NOBR;NOFRAMES;NOSCRIPT;OBJECT;OL;OPTION;P;PARAM;PLAINTEXT; " &_
"PRE;Q;S;SAMP;SCRIPT;Select;SMALL;SPAN;STRIKE;STRONG;STYLE;SUB;SUP; " &_
"TABLE;TBODY;TD;TEXTAREA;TFOOT;TH;THEAD;TITLE;TR;TT;U;UL;VAR;WBR;XMP; "
Const BLOCKTAGLIST = ";APPLET;EMBED;FRAMESET;HEAD;NOFRAMES;NOSCRIPT;OBJECT;SCRIPT;STYLE; "
Dim nPos1
Dim nPos2
Dim nPos3
Dim strResult
Dim strTagName
Dim bRemove
Dim bSearchForBlock
nPos1 = InStr(strText, " < ")
Do While nPos1 > 0
nPos2 = InStr(nPos1 + 1, strText, "> ")
If nPos2 > 0 Then
strTagName = Mid(strText, nPos1 + 1, nPos2 - nPos1 - 1)
strTagName = Replace(Replace(strTagName, vbCr, " "), vbLf, " ")
nPos3 = InStr(strTagName, " ")
If nPos3 > 0 The