Querying a Cassandra column family for rows that have not been updated in X days
- by knorv
I'm moving an existing MySQL based application over to Cassandra. So far finding the equivalent Cassandra data model has been quite easy, but I've stumbled on the following problem for which I'd appreciate some input:
Consider a MySQL table holding millions of entities:
CREATE TABLE entities (
id INT AUTO_INCREMENT NOT NULL,
entity_information VARCHAR(...),
entity_last_updated DATETIME,
PRIMARY KEY (id),
KEY (entity_last_updated)
);
The table is regularly queried for entities that need to be updated:
SELECT id FROM entities
WHERE entity_last_updated IS NULL
OR entity_last_updated < DATE_ADD(NOW(), INTERVAL -7*24 HOUR)
ORDER BY entity_last_updated ASC;
The entities returned by this queries are then updated using the following query:
UPDATE entities
SET entity_information = ?,
entity_last_updated = NOW()
WHERE id = ?;
What would be the corresponding Cassandra data model that would allow me to store the given information and effectively query the entities table for entities that need to be updated (that is: entities that have not been updated in the last seven days)?