implementing SRX Segmentation Rules in JavaScript
- by Sourabh
Hello ,
I want to implement the SRX Segmentation Rules using javascript to extract sentences from text.
In order to do this correctly I will have to follow the SRX rules.
eg. http://www.lisa.org/fileadmin/standards/srx20.html#refTR29
now there are two types of regular expressions
if found sentence should break like ". "
if found sentence should not break like abbreviation U.K or Mr.
For this again there are two parts
before breaking
after breaking
for example if the rule is
<rule break="no">
<beforebreak>\s*[0-9]+\.</beforebreak>
<afterbreak>\s</afterbreak>
</rule>
Which says if the pattern "\s*[0-9]+.\s" is found the segment should not break.
how do I implement using javascript, my be split function is not enough ?