Bulk Insert of hundreds of millions of records

Posted by Dave Jarvis on Stack Overflow See other posts from Stack Overflow or by Dave Jarvis
Published on 2010-05-16T02:27:52Z Indexed on 2010/05/16 2:30 UTC
Read the original article Hit count: 336

Filed under:
|

What is the fastest way to insert 237 million records into a table that has rules (for distributing the data across 84 child tables)?

First I tried inserts. No go. Then I tried inserts with BEGIN/COMMIT. Not nearly fast enough. Next, I tried COPY FROM, but then noticed the documentation states that the rules are ignored. (And it was having difficulties with the column order and date format -- it said that '1984-07-1' was not a valid integer; true, but a bit unexpected.)

Some example data:

station_id,taken,amount,category_id,flag
1,'1984-07-1',0,4,
1,'1984-07-2',0,4,
1,'1984-07-3',0,4,
1,'1984-07-4',0,4,T

Here is the table structure (with one rule included):

CREATE TABLE climate.measurement
(
  id bigserial NOT NULL,
  station_id integer NOT NULL,
  taken date NOT NULL,
  amount numeric(8,2) NOT NULL,
  category_id smallint NOT NULL,
  flag character varying(1) NOT NULL DEFAULT ' '::character varying
)
WITH (
  OIDS=FALSE
);
ALTER TABLE climate.measurement OWNER TO postgres;

CREATE OR REPLACE RULE i_measurement_01_001 AS
    ON INSERT TO climate.measurement
   WHERE date_part('month'::text, new.taken)::integer = 1 AND new.category_id = 1 DO INSTEAD  INSERT INTO climate.measurement_01_001 (id, station_id, taken, amount, category_id, flag)
  VALUES (new.id, new.station_id, new.taken, new.amount, new.category_id, new.flag);

I can generate the data into any format.

Am looking for something that won't take four days.

I originally had the data in MySQL (still do), but am hoping to get a performance increase by switching to PostgreSQL and am eager to use its PL/R extensions for stats.

I was also thinking about using: http://pgbulkload.projects.postgresql.org/

Any help, tips, or guidance would be greatly appreciated.

Thank you!

© Stack Overflow or respective owner

Related posts about postgresql

Related posts about bulkinsert