implementing SRX Segmentation Rules in JavaScript
Posted
by Sourabh
on Stack Overflow
See other posts from Stack Overflow
or by Sourabh
Published on 2010-05-03T15:02:18Z
Indexed on
2010/05/03
15:08 UTC
Read the original article
Hit count: 385
JavaScript
Hello ,
I want to implement the SRX Segmentation Rules using javascript to extract sentences from text.
In order to do this correctly I will have to follow the SRX rules.
eg. http://www.lisa.org/fileadmin/standards/srx20.html#refTR29
now there are two types of regular expressions
- if found sentence should break like ". "
- if found sentence should not break like abbreviation U.K or Mr.
For this again there are two parts
- before breaking
- after breaking
for example if the rule is
<rule break="no">
<beforebreak>\s*[0-9]+\.</beforebreak>
<afterbreak>\s</afterbreak>
</rule>
Which says if the pattern "\s*[0-9]+.\s" is found the segment should not break.
how do I implement using javascript, my be split function is not enough ?
© Stack Overflow or respective owner