How to use Apache HWPF to extract text and images out of a DOC file
Posted
by Hamed
on Stack Overflow
See other posts from Stack Overflow
or by Hamed
Published on 2009-03-12T05:06:36Z
Indexed on
2010/04/11
12:03 UTC
Read the original article
Hit count: 391
Hi...!
I downloaded the Apache HWPF. I want to use it to read a doc file and write its text into a plain text file. I don't know the HWPF so well.
My very simple program is here:
I have 3 problems now:
1-Some of packages have errors ( they can't find apache hdf). How I can fix them?
2-How I can use the methods of HWDF to find and extract the images out?
3-Some piece of my program is incomplete and incorrect.So please help me to complete it.
I have to complete this program in 2 days.
once again I repeat Please Please help me to complete this.
Thanks you Guys a lot for your help!!!
This is My Elementary code :
import org.apache.poi.poifs.filesystem.*;
import org.apache.poi.hwpf.*;
import org.apache.poi.hwpf.extractor.*;
import org.apache.poi.hwpf.model.PicturesTable;
import org.apache.poi.hwpf.usermodel.Picture;
public class test {
public void m1 (){
String filesname = "Hello.doc";
POIFSFileSystem fs = null;
fs = new POIFSFileSystem(new FileInputStream(filesname );
HWPFDocument doc = new HWPFDocument(fs);
WordExtractor we = new WordExtractor(doc);
String str = we.getText() ;
String[] paragraphs = we.getParagraphText();
Picture pic = new Picture(. . .) ;
pic.writeImageContent( . . . ) ;
PicturesTable picTable = new PicturesTable( . . . ) ;
if ( picTable.hasPicture( . . . ) ){
picTable.extractPicture(..., ...);
picTable.getAllPictures() ;
}
}
© Stack Overflow or respective owner