regressions with many nested categorical covariates
Posted
by eric
on Stack Overflow
See other posts from Stack Overflow
or by eric
Published on 2010-04-17T17:34:33Z
Indexed on
2010/04/17
17:43 UTC
Read the original article
Hit count: 251
r
|regression
I have a few hundred thousand measurements where the dependent variable is a probability, and would like to use logistic regression. However, the covariates I have are all categorical, and worse, are all nested. By this I mean that if a certain measurement has "city - Phoenix" then obviously it is certain to have "state - Arizona" and "country - U.S." I have four such factors - the most granular has some 20k levels, but if need be I could do without that one, I think. I also have a few non-nested categorical covariates (only four or so, with maybe three different levels each). What I am most interested in is prediction - given a new observation in some city, I would like to know the relevant probability/dependent variable. I am not interested as much in the related inferential machinery - standard deviations, etc - at least as of now. I am hoping I can afford to be sloppy. However, I would love to have that information unless it requires methods that are more computationally expensive. Does anyone have any advice on how to attack this? I have looked into mixed effects, but am not sure it is what I am looking for.
© Stack Overflow or respective owner