Latent Dirichlet Allocation, pitfalls, tips and programs

Posted by Gregg Lind on Stack Overflow See other posts from Stack Overflow or by Gregg Lind
Published on 2008-10-10T13:23:07Z Indexed on 2010/04/15 22:03 UTC
Read the original article Hit count: 483

I'm experimenting with Latent Dirichlet Allocation for topic disambiguation and assignment, and I'm looking for advice.

  1. Which program is the "best", where best is some combination of easiest to use, best prior estimation, fast
  2. How do I incorporate my intuitions about topicality. Let's say I think I know that some items in the corpus are really in the same category, like all articles by the same author. Can I add that into the analysis?
  3. Any unexpected pitfalls or tips I should know before embarking?

I'd prefer is there are R or Python front ends for whatever program, but I expect (and accept) that I'll be dealing with C.

© Stack Overflow or respective owner

Related posts about lda

Related posts about natural-language