Efficient and accurate way to compact and compare Python lists?

Posted by daveslab on Stack Overflow See other posts from Stack Overflow or by daveslab
Published on 2010-06-08T01:00:04Z Indexed on 2010/06/08 1:02 UTC
Read the original article Hit count: 252

Filed under:

python

|

list

|

comparison

|

hashing

Hi folks,

I'm trying to a somewhat sophisticated diff between individual rows in two CSV files. I need to ensure that a row from one file does not appear in the other file, but I am given no guarantee of the order of the rows in either file. As a starting point, I've been trying to compare the hashes of the string representations of the rows (i.e. Python lists). For example:

import csv

hashes = []
for row in csv.reader(open('old.csv','rb')):
  hashes.append( hash(str(row)) )

for row in csv.reader(open('new.csv','rb')):
  if hash(str(row)) not in hashes:
    print 'Not found'

But this is failing miserably. I am constrained by artificially imposed memory limits that I cannot change, and thusly I went with the hashes instead of storing and comparing the lists directly. Some of the files I am comparing can be hundreds of megabytes in size. Any ideas for a way to accurately compress Python lists so that they can be compared in terms of simple equality to other lists? I.e. a hashing system that actually works? Bonus points: why didn't the above method work?

© Stack Overflow or respective owner

Related posts about python

unmet dependencies in Ubuntu 12.04

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I tried today to install a dvb-card on my Ubuntu 12.04 (Linux blauhai-linux 3.2.0-25-generic #40-Ubuntu SMP Wed May 23 20:30:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux ). The installation failed with an error. After that, i tried to install python (it was already installed but i got this error): linux:~$… >>> More
How can I get sikuli-ide to work?

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I installed sikuli-ide with sudo apt-get install sikuli-ide Everything was fine until I tried to start it from the terminal. I typed sikuli-ide But the only response I got was [info] locale: en_US The application was not started, furthermore there is no desktop file and sikuli-ide does not… >>> More
Getting PATH right for python after MacPorts install

as seen on Super User - Search for 'Super User'
I can't import some python libraries (PIL, psycopg2) that I just installed with MacPorts. I looked through these forums, and tried to adjust my PATH variable in $HOME/.bash_profile in order to fix this but it did not work. I added the location of PIL and psycopg2 to PATH. I know that Terminal is… >>> More
call python with system() in R to run a python script emulating the python console

as seen on Stack Overflow - Search for 'Stack Overflow'
I want to pass a chunk of Python code to Python in R with something like system('python ...'), and I'm wondering if there is an easy way to emulate the python console in this case. For example, suppose the code is "print 'hello world'", how can I get the output like this in R? >>> print… >>> More
Python - Calling a non python program from python?

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, I am currently struggling to call a non python program from a python script. I have a ~1000 files that when passed through this C++ program will generate ~1000 outputs. Each output file must have a distinct name. The command I wish to run is of the form: program_name -input -output -o1 -o2… >>> More

Related posts about list

kernel module compiling error

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
sh@ubuntu:/home/ccpp/helloworld$ make gcc-4.6 -O2 -DMODULE -D_KERNEL_ -W -Wall -Wstrict-prototypes -Wmissing-prototypes -isystem /lib/modules/`uname -r`/build/include -c -o hello-1.o hello-1.c hello-1.c:4:0: warning: "MODULE" redefined [enabled by default] <command-line>:0:0: note: this is… >>> More
SEO: Nested List vs List, Split Over Divs vs Definition List

as seen on Pro Webmasters - Search for 'Pro Webmasters'
From an SEO perspective which, if any, is better: Option 1: Nested lists with h2 tags <ul id="mainpoints"> <li><h2>Powerful Analysis</h2> <ul> <li>Charting and indicators</li> <li>Daily trading signals</li> … >>> More
ASA hairpining: I basicaly want to allow 2 spokes to be able to communicate with each other.

as seen on Server Fault - Search for 'Server Fault'
ASA Spoke to Spoke Communication I have been looking at spke to spoke comms or "hairpining" for months and have posted on numerouse forums but to no avail. I have a Hub and spoke network where the HUB is an ASA Firewall version 8.2 * I basicaly want to allow 2 spokes to be able to communicate with… >>> More
How to remove a package entirely?

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
Hi I'm quite new to Linux, but before using it I was hearing that Windows programs, after uninstallation, leaves a lot of remains on the hard disc, and Linux removes all. I'm using Ubuntu 10.04. To uninstall packages I'm using sudo apt-get autoremove application_name or sudo aptitude purge application_name… >>> More
Update a list from another list

as seen on Stack Overflow - Search for 'Stack Overflow'
I have a list of users in local store that I need to update from a remote list of users every once in a while. Basically: If a remote user already exists locally, update its fields. If a remote user doesn't already exist locally, add the user. If a local user doesn't appear in the remote list, deactivate… >>> More