How to use Apache HWPF to extract text and images out of a DOC file

Posted by Hamed on Stack Overflow See other posts from Stack Overflow or by Hamed
Published on 2009-03-12T05:06:36Z Indexed on 2010/04/11 12:03 UTC
Read the original article Hit count: 391

Filed under:
|
|
|

Hi...!

I downloaded the Apache HWPF. I want to use it to read a doc file and write its text into a plain text file. I don't know the HWPF so well.

My very simple program is here:

I have 3 problems now:

1-Some of packages have errors ( they can't find apache hdf). How I can fix them?

2-How I can use the methods of HWDF to find and extract the images out?

3-Some piece of my program is incomplete and incorrect.So please help me to complete it.

I have to complete this program in 2 days.

once again I repeat Please Please help me to complete this.

Thanks you Guys a lot for your help!!!

This is My Elementary code :

import org.apache.poi.poifs.filesystem.*;

import org.apache.poi.hwpf.*;

import org.apache.poi.hwpf.extractor.*;

import org.apache.poi.hwpf.model.PicturesTable;

import org.apache.poi.hwpf.usermodel.Picture;

public class test {

public void m1 (){

	String filesname = "Hello.doc";

	POIFSFileSystem fs = null;

   fs = new POIFSFileSystem(new FileInputStream(filesname ); 

   HWPFDocument doc = new HWPFDocument(fs);

   WordExtractor we = new WordExtractor(doc);

   String str = we.getText() ;

   String[] paragraphs = we.getParagraphText();

   Picture pic = new Picture(. . .) ;

   pic.writeImageContent( . . . ) ;

   PicturesTable picTable = new PicturesTable( . . . ) ;

  if ( picTable.hasPicture( . . . ) ){

	  picTable.extractPicture(..., ...);
	  picTable.getAllPictures() ;
  }

}

© Stack Overflow or respective owner

Related posts about java

Related posts about apache