Removing duplicate SQL records to permit a unique key
- by j pimmel
I have a table ('sales') in a MYSQL DB which should have rightfully have had a unique constraint enforced to prevent duplicates. To first remove the dupes and set the constraint is proving a bit tricky.
Table structure (simplified):
'id (unique, autoinc)'
product_id
The goal is to enforce uniqueness for product_id. The de-duping policy I want to apply is to remove all duplicate records except the most recently created, eg: the highest id
Or to put another way, I would like to delete duplicate records, excluding the ids matched by the following query:
select id from sales s inner join (select product_id, max(id) as maxId from sales group by product_id having count(product_id) > 1) groupedByProdId on s.product_id and s.id = groupedByProdId.maxId
I've struggled with this on two fronts - writing the query to select the correct records to delete and then also the constraint in MYSQL where a subselect FROM clause of a DELETE cannot reference the same table from which data is being removed.