regressions with many nested categorical covariates

Posted by eric on Stack Overflow See other posts from Stack Overflow or by eric
Published on 2010-04-17T17:34:33Z Indexed on 2010/04/17 17:43 UTC
Read the original article Hit count: 246

Filed under:
|

I have a few hundred thousand measurements where the dependent variable is a probability, and would like to use logistic regression. However, the covariates I have are all categorical, and worse, are all nested. By this I mean that if a certain measurement has "city - Phoenix" then obviously it is certain to have "state - Arizona" and "country - U.S." I have four such factors - the most granular has some 20k levels, but if need be I could do without that one, I think. I also have a few non-nested categorical covariates (only four or so, with maybe three different levels each). What I am most interested in is prediction - given a new observation in some city, I would like to know the relevant probability/dependent variable. I am not interested as much in the related inferential machinery - standard deviations, etc - at least as of now. I am hoping I can afford to be sloppy. However, I would love to have that information unless it requires methods that are more computationally expensive. Does anyone have any advice on how to attack this? I have looked into mixed effects, but am not sure it is what I am looking for.

© Stack Overflow or respective owner

Related posts about r

    Related posts about regression