Retrieve the content of Microsoft Word document using OpenXml and C#
- by ybbest
One of the tasks involves me to retrieve the contents of Microsoft Word document (word2007 above). I try to search for some resources online with not much luck; most of the examples are for writing contents to word document using OpenXml. I decide to blog this as my reference and hopefully people who read this post will find it useful as well.
To retrieve the contents of Microsoft Word document using XML is extremely simple.
1. Firstly, you need to download and install the Open XML SDK 2.0 for Microsoft Office. (Download link)
2. Create a Console application then add the DocumentFormat.OpenXml.dll and WindowsBase.dll to the project, you can find these dlls in the .NET tab of the Add Reference window.
3. Write the following code to grab the contents from the word document and display it on the console window.
You can download the complete source code here.
References:
Getting Started with the Open XML SDK 2.0 for Microsoft Office
Walkthrough: Word 2007 XML Format
Word Processing How To
Open XML SDK 2.0 for Microsoft Office
Office Developer Center
openxmldeveloper
Open XML Package Explorer