How to keep only duplicates efficiently?
- by Marc Eaddy
Given an STL vector, I'd like an algorithm that outputs only the duplicates in sorted order, e.g.,
INPUT : { 4, 4, 1, 2, 3, 2, 3 }
OUTPUT: { 2, 3, 4 }
The algorithm is trivial, but the goal is to make it as efficient as std::unique(). My naive implementation modifies the container in-place:
My naive implementation:
void keep_duplicates(vector<int>* pv)
{
// Sort (in-place) so we can find duplicates in linear time
sort(pv->begin(), pv->end());
vector<int>::iterator it_start = pv->begin();
while (it_start != pv->end())
{
size_t nKeep = 0;
// Find the next different element
vector<int>::iterator it_stop = it_start + 1;
while (it_stop != pv->end() && *it_start == *it_stop)
{
nKeep = 1; // This gets set redundantly
++it_stop;
}
// If the element is a duplicate, keep only the first one (nKeep=1).
// Otherwise, the element is not duplicated so erase it (nKeep=0).
it_start = pv->erase(it_start + nKeep, it_stop);
}
}
If you can make this more efficient, elegant, or general, please let me know. For example, a custom sorting algorithm, or copy elements in the 2nd loop to eliminate the erase() call.