Latent Dirichlet Allocation, pitfalls, tips and programs
Posted
by Gregg Lind
on Stack Overflow
See other posts from Stack Overflow
or by Gregg Lind
Published on 2008-10-10T13:23:07Z
Indexed on
2010/04/15
22:03 UTC
Read the original article
Hit count: 483
I'm experimenting with Latent Dirichlet Allocation for topic disambiguation and assignment, and I'm looking for advice.
- Which program is the "best", where best is some combination of easiest to use, best prior estimation, fast
- How do I incorporate my intuitions about topicality. Let's say I think I know that some items in the corpus are really in the same category, like all articles by the same author. Can I add that into the analysis?
- Any unexpected pitfalls or tips I should know before embarking?
I'd prefer is there are R or Python front ends for whatever program, but I expect (and accept) that I'll be dealing with C.
© Stack Overflow or respective owner