incremental way of counting quantiles for large set of data

Posted by Gacek on Stack Overflow See other posts from Stack Overflow or by Gacek
Published on 2010-05-14T20:14:38Z Indexed on 2010/05/15 0:04 UTC
Read the original article Hit count: 271

I need to count the quantiles for a large set of data.

Let's assume we can get the data only through some portions (i.e. one row of a large matrix). To count the Q3 quantile one need to get all the portions of the data and store it somewhere, then sort it and count the quantile:

List<double> allData = new List<double>();
foreach(var row in matrix) // this is only example. In fact the portions of data are not rows of some matrix
   {
    allData.AddRange(row);
   }

allData.Sort();
double p = 0.75*allData.Count;
int idQ3 = (int)Math.Ceiling(p) - 1;
double Q3 = allData[idQ3];

Now, I would like to find a way of counting this without storing the data in some separate variable.
The best solution would be to count some parameters od mid-results for first row and then adjust it step by step for next rows.

Note:

  • These datasets are really big (ca 5000 elements in each row)
  • The Q3 can be estimated, it doesn't have to be an exact value.
  • I call the portions of data "rows", but they can have different leghts! Usually it varies not so much (+/- few hundred samples) but it varies!

This question is similar to this one: http://stackoverflow.com/questions/1058813/on-line-iterator-algorithms-for-estimating-statistical-median-mode-skewness But I need to count quantiles. ALso there are few articles in this topic, i.e.:
http://web.cs.wpi.edu/~hofri/medsel.pdf
http://portal.acm.org/citation.cfm?id=347195&dl

But before I would try to implement these, I wanted to ask you if there are maybe any other, qucker ways of counting the 0.25/0.75 quantiles?

© Stack Overflow or respective owner

Related posts about statistics

Related posts about numerical-methods