How do you parse a paragraph of text into sentences? (perferrably in Ruby)
Posted
by henry74
on Stack Overflow
See other posts from Stack Overflow
or by henry74
Published on 2009-05-13T22:49:48Z
Indexed on
2010/05/06
16:08 UTC
Read the original article
Hit count: 522
How do you take paragraph or large amount of text and break it into sentences (perferably using Ruby) taking into account cases such as Mr. and Dr. and U.S.A? (Assuming you just put the sentences into an array of arrays)
UPDATE: One possible solution I thought of involves using a parts-of-speech tagger (POST) and a classifier to determine the end of a sentence:
Getting data from Mr. Jones felt the warm sun on his face as he stepped out onto the balcony of his summer home in Italy. He was happy to be alive.
CLASSIFIER Mr./PERSON Jones/PERSON felt/O the/O warm/O sun/O on/O his/O face/O as/O he/O stepped/O out/O onto/O the/O balcony/O of/O his/O summer/O home/O in/O Italy/LOCATION ./O He/O was/O happy/O to/O be/O alive/O ./O
POST Mr./NNP Jones/NNP felt/VBD the/DT warm/JJ sun/NN on/IN his/PRP$ face/NN as/IN he/PRP stepped/VBD out/RP onto/IN the/DT balcony/NN of/IN his/PRP$ summer/NN home/NN in/IN Italy./NNP He/PRP was/VBD happy/JJ to/TO be/VB alive./IN
Can we assume, since Italy is a location, the period is the valid end of the sentence? Since ending on "Mr." would have no other parts-of-speech, can we assume this is not a valid end-of-sentence period? Is this the best answer to the my question?
Thoughts?
© Stack Overflow or respective owner