Mysql: Working With 192 Trillion Records... (Yes, 192 Trillion)

Posted by Sarah on Server Fault See other posts from Server Fault or by Sarah
Published on 2010-08-08T14:15:34Z Indexed on 2010/12/27 20:55 UTC
Read the original article Hit count: 121

Filed under:
|
|

Here's the question...

Considering 192 trillion records, what should my considerations be?

My main concern is speed.

Here's the table...

    CREATE TABLE `ref` (
  `id` INTEGER(13) AUTO_INCREMENT DEFAULT NOT NULL,
  `rel_id` INTEGER(13) NOT NULL,
  `p1` INTEGER(13) NOT NULL,
  `p2` INTEGER(13) DEFAULT NULL,
  `p3` INTEGER(13) DEFAULT NULL,
  `s` INTEGER(13) NOT NULL,
  `p4` INTEGER(13) DEFAULT NULL,
  `p5` INTEGER(13) DEFAULT NULL,
  `p6` INTEGER(13) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY (`s`),
  KEY (`rel_id`),
  KEY (`p3`),
  KEY (`p4`)
    );

Here's the queries...

SELECT id, s FROM ref WHERE red_id="$rel_id" AND p3="$p3" AND p4="$p4"

SELECT rel_id, p1, p2, p3, p4, p5, p6 FROM ref WHERE id="$id"

INSERT INTO rel (rel_id, p1, p2, p3, s, p4, p5, p6)
VALUES ("$rel_id", "$p1", "$p2", "$p3", "$s", "$p4", "$p5", "$p6")

Here's some notes...

  • The SELECT's will be done much more frequently than the INSERT. However, occasionally I want to add a few hundred records at a time.
  • Load-wise, there will be nothing for hours then maybe a few thousand queries all at once.
  • Don't think I can normalize any more (need the p values in a combination)
  • The database as a whole is very relational.
  • This will be the largest table by far (next largest is about 900k)

UPDATE (08/11/2010)

Interestingly, I've been given a second option...

Instead of 192 trillion I could store 2.6*10^16 (15 zeros, meaning 26 Quadrillion)...

But in this second option I would only need to store one bigint(18) as the index in a table. That's it - just the one column. So I would just be checking for the existence of a value. Occasionally adding records, never deleting them.

So that makes me think there must be a better solution then mysql for simply storing numbers...

Given this second option, should I take it or stick with the first...

[edit] Just got news of some testing that's been done - 100 million rows with this setup returns the query in 0.0004 seconds [/edit]

© Server Fault or respective owner

Related posts about mysql

Related posts about database