Detecting syllables in a word

Posted by user50705 on Stack Overflow See other posts from Stack Overflow or by user50705
Published on 2009-01-01T17:08:41Z Indexed on 2010/12/21 23:54 UTC
Read the original article Hit count: 355

I need to find a fairly efficient way to detect syllables in a word. E.g.,

invisible -> in-vi-sib-le

There are some syllabification rules that could be used:

V CV VC CVC CCV CCCV CVCC

*where V is a vowel and C is a consonant. e.g.,

pronunciation (5 Pro-nun-ci-a-tion; CV-CVC-CV-V-CVC)

I've tried few methods, among which were using regex (which helps only if you want to count syllables) or hard coded rule definition (a brute force approach which proves to be very inefficient) and finally using a finite state automata (which did not result with anything useful).

The purpose of my application is to create a dictionary of all syllables in a given language. This dictionary will later be used for spell checking applications (using Bayesian classifiers) and text to speech synthesis.

I would appreciate if one could give me tips on an alternate way to solve this problem besides my previous approaches.

I work in Java, but any tip in C/C++, C#, Python, Perl... would work for me.

© Stack Overflow or respective owner

Related posts about nlp

Related posts about natural-language