Random List of millions of elements in Python Efficiently

Posted by eWizardII on Stack Overflow See other posts from Stack Overflow or by eWizardII
Published on 2011-01-08T02:35:07Z Indexed on 2011/01/08 2:54 UTC
Read the original article Hit count: 248

Filed under:
|
|
|
|

Hello,

I have read this answer potentially as the best way to randomize a list of strings in Python. I'm just wondering then if that's the most efficient way to do it because I have a list of about 30 million elements via the following code:

import json
from sets import Set
from random import shuffle

a = []

for i in range(0,193):
    json_data = open("C:/Twitter/user/user_" + str(i) + ".json")
    data = json.load(json_data)
    for j in range(0,len(data)):
        a.append(data[j]['su'])
new = list(Set(a))
print "Cleaned length is: " + str(len(new))

## Take Cleaned List and Randomize it for Analysis
shuffle(new)

If there is a more efficient way to do it, I'd greatly appreciate any advice on how to do it.

Thanks,

© Stack Overflow or respective owner

Related posts about python

Related posts about list