Delete Duplicate records from large csv file C# .Net

Posted by Sandhurst on Stack Overflow See other posts from Stack Overflow or by Sandhurst
Published on 2011-03-11T11:47:34Z Indexed on 2011/03/11 16:10 UTC
Read the original article Hit count: 447

Filed under:

I have created a solution which read a large csv file currently 20-30 mb in size, I have tried to delete the duplicate rows based on certain column values that the user chooses at run time using the usual technique of finding duplicate rows but its so slow that it seems the program is not working at all.

What other technique can be applied to remove duplicate records from a csv file

Here's the code, definitely I am doing something wrong

DataTable dtCSV = ReadCsv(file, columns);
//columns is a list of string List column
DataTable dt=RemoveDuplicateRecords(dtCSV, columns);

private DataTable RemoveDuplicateRecords(DataTable dtCSV, List<string> columns)
        {
            DataView dv = dtCSV.DefaultView;
            string RowFilter=string.Empty;

            if(dt==null)
            dt = dv.ToTable().Clone();

            DataRow row = dtCSV.Rows[0];
            foreach (DataRow row in dtCSV.Rows)
            {
                try
                {
                    RowFilter = string.Empty;

                    foreach (string column in columns)
                    {
                        string col = column;
                        RowFilter += "[" + col + "]" + "='" + row[col].ToString().Replace("'","''") + "' and ";
                    }
                    RowFilter = RowFilter.Substring(0, RowFilter.Length - 4);
                    dv.RowFilter = RowFilter;
                    DataRow dr = dt.NewRow();
                    bool result = RowExists(dt, RowFilter);
                    if (!result)
                    {
                        dr.ItemArray = dv.ToTable().Rows[0].ItemArray;
                        dt.Rows.Add(dr);

                    }

                }
                catch (Exception ex)
                {
                }
            }
            return dt;
        }

Developer IT

Delete Duplicate records from large csv file C# .Net - Developer IT

Delete Duplicate records from large csv file C# .Net

c#

.NET

csv

Related posts about c#

.NET WebRequest.PreAuthenticate not quite what it sounds like

HttpWebRequest and Ignoring SSL Certificate Errors

The dynamic Type in C# Simplifies COM Member Access from Visual FoxPro

Dynamic Type to do away with Reflection

Finding a Relative Path in .NET

Related posts about .NET

Apt-Get Update: failure to fetch; can't connect to any sources

12.04: Apt-Get Update: failure to fetch; can't connect to any sources

What's New in ASP.NET 4

.NET Reflector 6, .NET Reflector Pro, TestDriven.NET, .NET 4.0 and Mono

Redmine on Apache2 with Passenger issue

Categories cloud