Extracting Demographic and Contact Information from unstructured text files

Posted by jn29098 on Stack Overflow See other posts from Stack Overflow or by jn29098
Published on 2010-06-01T01:50:48Z Indexed on 2010/06/01 1:53 UTC
Read the original article Hit count: 326

I am looking to extract specific items out of a large pool of unstructured documents. These documents could be 1-5 pages of text formatted in various ways by the user, but in most cases would contain at least:

  • Name
  • Address (physical)
  • Email Address
  • Phone number
  • website URL

I'm looking for a semantic parser that can attempt to extract these elements from the documents so that I can load that information into a relational database and work with these records as contacts.

Other services I've looked for, while valuable for other purposes, do not address this specific need.

Any thoughts, suggestions or leads?

© Stack Overflow or respective owner

Related posts about text

Related posts about text-extraction