How to keep only duplicates efficiently?

Posted by Marc Eaddy on Stack Overflow See other posts from Stack Overflow or by Marc Eaddy
Published on 2010-03-31T03:05:31Z Indexed on 2010/03/31 3:13 UTC
Read the original article Hit count: 509

Filed under:

c++

|

stl

|

unique

|

algorithm

|

efficiency

Given an STL vector, I'd like an algorithm that outputs only the duplicates in sorted order, e.g.,

INPUT : { 4, 4, 1, 2, 3, 2, 3 }
OUTPUT: { 2, 3, 4 }

The algorithm is trivial, but the goal is to make it as efficient as std::unique(). My naive implementation modifies the container in-place:

My naive implementation:

void keep_duplicates(vector<int>* pv)
{
    // Sort (in-place) so we can find duplicates in linear time
    sort(pv->begin(), pv->end());

    vector<int>::iterator it_start = pv->begin();
    while (it_start != pv->end())
    {
        size_t nKeep = 0;

        // Find the next different element
        vector<int>::iterator it_stop = it_start + 1;
        while (it_stop != pv->end() && *it_start == *it_stop)
        {
            nKeep = 1; // This gets set redundantly
            ++it_stop;
        }

        // If the element is a duplicate, keep only the first one (nKeep=1).
        // Otherwise, the element is not duplicated so erase it (nKeep=0).
        it_start = pv->erase(it_start + nKeep, it_stop);
    }
}

If you can make this more efficient, elegant, or general, please let me know. For example, a custom sorting algorithm, or copy elements in the 2nd loop to eliminate the erase() call.

© Stack Overflow or respective owner

Related posts about c++

C++ : C++ Primer (Stanley Lipmann) or The C++ programming language (special edition)

as seen on Stack Overflow - Search for 'Stack Overflow'
I have a Computer Science degree (long2 time ago) .. I do know Java OOP but i am now trying to pick up C++. I do have C and of course data structure using C or pascal. I have started reading Bjarne Stroustrup book (The C++ Programming Language - Special Edition) but find it extremely difficult esp… >>> More
Which C++ book shold I get between "C++ Primer" vs "C++ Primer Plus"

as seen on Stack Overflow - Search for 'Stack Overflow'
I want to learn C++ by using Vim and MinGW as compiler. I'm interesting at "C++ Primer (4th Edition)" and "C++ Primer Plus (5th Edition)" but I don't know how about it different. It has no book store that I can review those books, so I want to know, what is the different between those book and which… >>> More
Managed c++ std::string not accessible in unmanaged c++

as seen on Stack Overflow - Search for 'Stack Overflow'
In unmanaged c++ dll i have a function which takes constant std::string as argument Prototype : void read ( const std::string &imageSpec_ ) I call this function from managed c++ dll by passing a std::string. When i debug the unmanaged c++ code the parameter imageSpec_ shows the value correctly… >>> More
I need help on my C++ assignment using MS Visual C++

as seen on Stack Overflow - Search for 'Stack Overflow'
Ok, so I don't want you to do my homework for me, but I'm a little lost with this final assignment and need all the help I can get. Learning about programming is tough enough, but doing it online is next to impossible for me... Now, to get to the program, I am going to paste what I have so far. This… >>> More
The Definitive C++ Book Guide and List

as seen on Stack Overflow - Search for 'Stack Overflow'
After more than a few questions about deciding on C++ books I thought we could make a better community wiki version. Providing QUALITY books and an approximate skill level. Maybe we can add a short blurb/description about each book that you have personally read / benefited from. Feel free to debate… >>> More

Related posts about stl

Fastest way to write large STL vector to file using STL

as seen on Stack Overflow - Search for 'Stack Overflow'
I have a large vector (10^9 elements) of chars, and I was wondering what is the fastest way to write such vector to a file. So far I've been using next code: vector<char> vs; // ... Fill vector with data ofstream outfile("nanocube.txt", ios::out | ios::binary); ostream_iterator<char>… >>> More
question about STL thread-safe and STL debugging

as seen on Stack Overflow - Search for 'Stack Overflow'
I have two questions about STL 1) why STL is not thread-safe? Is there any structure that is thread-safe? 2) How to debug STL using GDB? In GDB, how can I print a vector? >>> More
The C++ Standard Template Library as a BDB Database (part 1)

as seen on Oracle Blogs - Search for 'Oracle Blogs'
If you've used C++ you undoubtedly have used the Standard Template Libraries. Designed for in-memory management of data and collections of data this is a core aspect of all C++ programs. Berkeley DB is a database library with a variety of APIs designed to ease development, one of those APIs extends… >>> More
Design approach, string table data, variables, stl memory usage

as seen on Stack Overflow - Search for 'Stack Overflow'
I have an old structure class like this: typedef vector<vector<string>> VARTYPE_T; which works as a single variable. This variable can hold from one value over a list to data like a table. Most values are long,double, string or double [3] for coordinates (x,y,z). I just convert them as… >>> More
Sort list using stl sort function

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm trying to sort a list (part of a class) in descending containg items of a struct but it doesn't compile(error: no match for 'operator-' in '__last - __first'): sort(Result.poly.begin(), Result.poly.end(), SortDescending()); And here's SortDescending: struct SortDescending { bool operator()(const… >>> More