日期:2014-05-17  浏览次数:20791 次

Java 过滤 html、script、style 代码得到纯字符串 方法

  1. /***?
  2. *??
  3. *?@param?content?内容String?
  4. *?@param?p?>0?.位数?
  5. *?@return?@tale:?
  6. *?@purpose:得到相应位数已过滤html、script、style?标签的内容?内容结尾?为...?
  7. *?@author:Simon?-?赵振明?
  8. *?@CreationTime:Aug?25,?201011:07:06?AM?
  9. */??
  10. public?static?String?getNoHTMLString(String?content,int?p){??
  11. ????
  12. ????if(null==content)?return?"";??
  13. ????if(0==p)?return?"";??
  14. ????
  15. ????java.util.regex.Pattern?p_script;???
  16. ?????????java.util.regex.Matcher?m_script;???
  17. ?????????java.util.regex.Pattern?p_style;???
  18. ?????????java.util.regex.Matcher?m_style;???
  19. ?????????java.util.regex.Pattern?p_html;???
  20. ?????????java.util.regex.Matcher?m_html;???
  21. ??????????
  22. ?????try?{???
  23. ?????????String?regEx_script?=?"<[\\s]*?script[^>]*?>[\\s\\S]*?<[\\s]*?\\/[\\s]*?script[\\s]*?>";??
  24. ?????????//定义script的正则表达式{或<script[^>]*?>[\\s\\S]*?<\\/script>?}? ??
  25. ?????????String?regEx_style?=?"<[\\s]*?style[^>]*?>[\\s\\S]*?<[\\s]*?\\/[\\s]*?style[\\s]*?>";???
  26. ???????????????//定义style的正则表达式{或<style[^>]*?>[\\s\\S]*?<\\/style>?}? ??
  27. ???????????????String?regEx_html?=?"<[^>]+>";?//定义HTML标签的正则表达式? ??
  28. ?????????????
  29. ???????????????p_script?=?Pattern.compile(regEx_script,Pattern.CASE_INSENSITIVE);???
  30. ???????????????m_script?=?p_script.matcher(content);???
  31. ???????????????content?=?m_script.replaceAll("");?//过滤script标签 ??
  32. ???????????????p_style?=?Pattern.compile(regEx_style,Pattern.CASE_INSENSITIVE);???
  33. ???????????????m_style?=?p_style.matcher(content);???
  34. ???????????????content?=?m_style.replaceAll("");?//过滤style标签? ??
  35. ?????????????
  36. ???????????????p_html?=?Pattern.compile(regEx_html,Pattern.CASE_INSENSITIVE);???
  37. ???????????????m_html?=?p_html.matcher(content);???
  38. ???