Wikipedia : Java library to remove wikipedia text markup removal

Posted by Algorist on Stack Overflow See other posts from Stack Overflow or by Algorist
Published on 2010-05-19T06:15:41Z Indexed on 2010/05/19 6:20 UTC
Read the original article Hit count: 404

Filed under:
|
|
|

Hi,

I downloaded wikipedia dump and now want to remove the wikipedia markup in the contents of each page. I tried writing regular expressions but they are too many to handle. I found a python library but I need a java library because, I want to integrate into my code.

Thank you.

© Stack Overflow or respective owner

Related posts about java

Related posts about parsing