Improve Efficiency in Array comparison in Ruby
- by user2985025
Hi I am working on Ruby /cucumber and have an requirement to develop a comparison module/program to compare two files.
Below are the requirements
The project is a migration project . Data from one application is moved to another
Need to compare the data from the existing application against the new ones.
Solution :
I have developed a comparison engine in Ruby for the above requirement.
a) Get the data, de duplicated and sorted from both the DB's
b) Put the data in a text file with "||" as delimiter
c) Use the key columns (number) that provides a unique record in the db to compare the two files
For ex File1 has 1,2,3,4,5,6 and file2 has 1,2,3,4,5,7 and the columns 1,2,3,4,5 are key columns. I use these key columns and compare 6 and 7 which results in a fail.
Issue :
The major issue we are facing here is if the mismatches are more than 70% for 100,000 records or more the comparison time is large.
If the mismatches are less than 40% then comparison time is ok.
Diff and Diff -LCS will not work in this case because we need key columns to arrive at accurate data comparison between two applications.
Is there any other method to efficiently reduce the time if the mismatches are more thatn 70% for 100,000 records or more.
Thanks