如何使用POI提取WORD文档中的内容?
想用POI包提取word文档中的内容,POI包的下载页面如下所示,不知道要下载哪个?
Index of /apache/jakarta/poi/release
Name Last modified Size Description-------------------------------------------- Parent Directory - Jakarta project
bin/ 06-Aug-2004 05:44 - Jakarta project
src/ 06-Aug-2004 05:46 - Jakarta project
KEYS 25-Jan-2004 15:04 1.5K Jakarta project
网址是:http://apache.justdn.org/jakarta/poi/
------解决方案--------------------jakarta POI开源项目组HWPF(在下载后的scratchpad目录里)是操作word文档,在这里作了个简单的例子
下载地址:http://www.apache. org/dist/jakarta/Poi/
<%@page contentType= "text/html; charset=GBK " import= "java.io.*,org.apache.poi.hwpf.HWPFDocument,org.apache.poi.hwpf.usermodel.*,org.apache.poi.hwpf.model.* " %>
<html>
<head>
<title>
testHWPF
</title>
</head>
<body bgcolor= "#ffffff ">
<h1>
</h1>
<%
HWPFDocument doc = new HWPFDocument(new FileInputStream( "g:\\a.doc "));
Range r = doc.getRange (); //取得word文档的范围
StyleSheet styleSheet = doc.getStyleSheet ();
int sectionLevel = 0;
int lenParagraph = r.numParagraphs ();//取得段落数
int c=r.numCharacterRuns();
int b=r.numSections();
String s=r.text();
boolean inCode = false;
// Paragraph p;
for (int x = 0; x < lenParagraph; x++)
{
Paragraph p = r.getParagraph (x);
String text = p.text ();
%>
<%=text%> <br>
<%
if (text.trim ().length () == 0)
{
continue;
}
}
//doc.write(new FileOutputStream( "g:\\b.doc "));
%>
char: <%=c%> <br>
section: <%=b%> <br>
text: <%=s%> <br>
</body>
</html>