Unable to get set intersection to work
- by chavanak
Sorry for the double post, I will update this question if I can't get things to work :)
I am trying to compare two files. I will list the two file content:
File 1 File 2
"d.complex.1" "d.complex.1"
1 4
5 5
48 47
65 21
d.complex.10 d.complex.10
46 6
21 46
109 121
192 192
TI am trying to compare the contents of the two file but not in a trivial way. I will explain what I want with an example. If you observe the file content I have typed above, the d.complex.1 of file_1 has "5" similar to d.complex.1 in file_2; the same d.complex.1 in file_1 has nothing similar to d.complex.10 in file_2. What I am trying to do is just to print out those d.complex. which has nothing in similar with the other d.complex. Consider the d.complex. as a heading if you want. But all I am trying is compare the numbers below each d.complex. and if nothing matches, I want that particular d.complex. from both files to be printed. If even one number is present in both d.complex. of both files, I want it to be rejected.
My Code:
The method I chose to achieve this was to use sets and then do a difference. Code I wrote was:
first_complex=open( "file1.txt", "r" )
first_complex_lines=first_complex.readlines()
first_complex_lines=map( string.strip, first_complex_lines )
first_complex.close()
second_complex=open( "file2.txt", "r" )
second_complex_lines=second_complex.readlines()
second_complex_lines=map( string.strip, second_complex_lines )
second_complex.close()
list_1=[]
list_2=[]
res_1=[]
for line in first_complex_lines:
if line.startswith( "d.complex" ):
res_1.append( [] )
res_1[-1].append( line )
res_2=[]
for line in second_complex_lines:
if line.startswith( "d.complex" ):
res_2.append( [] )
res_2[-1].append( line )
h=len( res_1 )
k=len( res_2 )
for i in res_1:
for j in res_2:
print i[0]
print j[0]
target_set=set ( i )
target_set_1=set( j )
for s in target_set:
if s not in target_set_1:
if s[0] != "d":
print s
The above code is giving an output like this (just an example):
d.complex.1.dssp
d.complex.1.dssp
1
48
65
d.complex.1.dssp
d.complex.10.dssp
46
21
109
What I would like to have is:
d.complex.1
d.complex.1 (name from file2)
d.complex.1
d.complex.10 (name from file2)
I am sorry for confusing you guys, but this is all that is required.
I am so new to python so my concept above might be flawed. Also I have never used sets before :(. Can someone give me a hand here?