Remove duplicate records/objects uniquely identified by multiple attributes
- by keruilin
I have a model called HeroStatus with the following attributes:
id
user_id
recordable_type
hero_type (can be NULL!)
recordable_id
created_at
There are over 100 hero_statuses, and a user can have many hero_statuses, but can't have the same hero_status more than once.
A user's hero_status is uniquely identified by the combination of recordable_type + hero_type + recordable_id. What I'm trying to say essentially is that there can't be a duplicate hero_status for a specific user.
Unfortunately, I didn't have a validation in place to assure this, so I got some duplicate hero_statuses for users after I made some code changes. For example:
user_id = 18
recordable_type = 'Evil'
hero_type = 'Halitosis'
recordable_id = 1
created_at = '2010-05-03 18:30:30'
user_id = 18
recordable_type = 'Evil'
hero_type = 'Halitosis'
recordable_id = 1
created_at = '2009-03-03 15:30:00'
user_id = 18
recordable_type = 'Good'
hero_type = 'Hugs'
recordable_id = 1
created_at = '2009-02-03 12:30:00'
user_id = 18
recordable_type = 'Good'
hero_type = NULL
recordable_id = 2
created_at = '2009-012-03 08:30:00'
(Last two are not a dups obviously. First two are.) So what I want to do is get rid of the duplicate hero_status. Which one? The one with the most-recent date.
I have three questions:
How do I remove the duplicates using a SQL-only approach?
How do I remove the duplicates using a pure Ruby solution? Something similar to this: http://stackoverflow.com/questions/2790004/removing-duplicate-objects.
How do I put a validation in place to prevent duplicate entries in the future?