How to keep only duplicates efficiently?
        Posted  
        
            by Marc Eaddy
        on Stack Overflow
        
        See other posts from Stack Overflow
        
            or by Marc Eaddy
        
        
        
        Published on 2010-03-31T03:05:31Z
        Indexed on 
            2010/03/31
            3:13 UTC
        
        
        Read the original article
        Hit count: 497
        
Given an STL vector, I'd like an algorithm that outputs only the duplicates in sorted order, e.g.,
INPUT : { 4, 4, 1, 2, 3, 2, 3 }
OUTPUT: { 2, 3, 4 }
The algorithm is trivial, but the goal is to make it as efficient as std::unique(). My naive implementation modifies the container in-place:
My naive implementation:
void keep_duplicates(vector<int>* pv)
{
    // Sort (in-place) so we can find duplicates in linear time
    sort(pv->begin(), pv->end());
    vector<int>::iterator it_start = pv->begin();
    while (it_start != pv->end())
    {
        size_t nKeep = 0;
        // Find the next different element
        vector<int>::iterator it_stop = it_start + 1;
        while (it_stop != pv->end() && *it_start == *it_stop)
        {
            nKeep = 1; // This gets set redundantly
            ++it_stop;
        }
        // If the element is a duplicate, keep only the first one (nKeep=1).
        // Otherwise, the element is not duplicated so erase it (nKeep=0).
        it_start = pv->erase(it_start + nKeep, it_stop);
    }
}
If you can make this more efficient, elegant, or general, please let me know. For example, a custom sorting algorithm, or copy elements in the 2nd loop to eliminate the erase() call.
© Stack Overflow or respective owner