Stata Nearest neighbor of percentile
- by Kyle Billings
This has probably already been answered, but I must just be searching for the wrong terms.
Suppose I am using the built in Stata data set auto:
sysuse auto, clear
and say for example I am working with 1 independent and 1 dependent variable and I want to essentially compress down to the IQR elements, min, p(25), median, p(75), max...
so I use command,
keep weight mpg
sum weight, detail
return list
local min=r(min)
local lqr=r(p25)
local med = r(p50)
local uqr = r(p75)
local max = r(max)
keep if weight==`min' | weight==`max' | weight==`med' | weight==`lqr' | weight==`uqr'
Hence, I want to compress the data set down to only those 5 observations, and for example in this situation the median is not actually an element of the weight vector. there is an observation above and an observation below (due to the definition of median this is no surprise). is there a way that I can tell stata to look for the nearest neighbor above the percentile. ie. if r(p50) is not an element of weight then search above that value for the next observation?
The end result is I am trying to get the data down to 2 vectors, say weight and mpg such that for each of the 5 elements of weight in the IQR have their matching response in mpg.
Any thoughts?