Search Results

Search found 1 results on 1 pages for 'akavall'.

Page 1/1 | 1 

  • Efficient way to get highly correlated pairs from large data set in Python or R

    - by Akavall
    I have a large data set (Let's say 10,000 variables with about 1000 elements each), we can think of it as 2D list, something like: [[variable_1], [variable_2], ............ [variable_n] ] I want to extract highly correlated variable pairs from that data. I want "highly correlated" to be a parameter that I can choose. I don't need all pairs to be extracted, and I don't necessarily want the most correlated pairs. As long as there is an efficient method that gets me highly correlated pairs I am happy. Also, it would be nice if a variable does not show up in more than one pair. Although this might not be crucial. Of course, there is a brute force way to finding such pairs, but it is too slow for me. I've googled around for a bit and found some theoretical work on this issue, but I wasn't able for find a package that could do what I am looking for. I mostly work in python, so a package in python would be most helpful, but if there exists a package in R that does what I am looking for it will be great. Does anyone know of a package that does the above in Python or R? Or any other ideas? Thank You in Advance

    Read the article

1