Retrieve the content of Microsoft Word document using OpenXml and C#

Posted by ybbest on YBBest See other posts from YBBest or by ybbest
Published on Wed, 29 Jun 2011 11:20:04 +0000 Indexed on 2011/06/29 16:32 UTC
Read the original article Hit count: 672

One of the tasks involves me to retrieve the contents of Microsoft Word document (word2007 above). I try to search for some resources online with not much luck; most of the examples are for writing contents to word document using OpenXml. I decide to blog this as my reference and hopefully people who read this post will find it useful as well.

To retrieve the contents of Microsoft Word document using XML is extremely simple.

1. Firstly, you need to download and install the Open XML SDK 2.0 for Microsoft Office. (Download link)

2. Create a Console application then add the DocumentFormat.OpenXml.dll and WindowsBase.dll to the project, you can find these dlls in the .NET tab of the Add Reference window.

3. Write the following code to grab the contents from the word document and display it on the console window.

You can download the complete source code here.

References:

Getting Started with the Open XML SDK 2.0 for Microsoft Office

Walkthrough: Word 2007 XML Format

Word Processing How To

Open XML SDK 2.0 for Microsoft Office

Office Developer Center

openxmldeveloper

Open XML Package Explorer


© YBBest or respective owner

Related posts about c#

Related posts about openxml