How to speed up a slow UPDATE query
- by Mike Christensen
I have the following UPDATE query:
UPDATE Indexer.Pages SET LastError=NULL where LastError is not null;
Right now, this query takes about 93 minutes to complete. I'd like to find ways to make this a bit faster.
The Indexer.Pages table has around 506,000 rows, and about 490,000 of them contain a value for LastError, so I doubt I can take advantage of any indexes here.
The table (when uncompressed) has about 46 gigs of data in it, however the majority of that data is in a text field called html. I believe simply loading and unloading that many pages is causing the slowdown. One idea would be to make a new table with just the Id and the html field, and keep Indexer.Pages as small as possible. However, testing this theory would be a decent amount of work since I actually don't have the hard disk space to create a copy of the table. I'd have to copy it over to another machine, drop the table, then copy the data back which would probably take all evening.
Ideas? I'm using Postgres 9.0.0.
UPDATE:
Here's the schema:
CREATE TABLE indexer.pages
(
id uuid NOT NULL,
url character varying(1024) NOT NULL,
firstcrawled timestamp with time zone NOT NULL,
lastcrawled timestamp with time zone NOT NULL,
recipeid uuid,
html text NOT NULL,
lasterror character varying(1024),
missingings smallint,
CONSTRAINT pages_pkey PRIMARY KEY (id ),
CONSTRAINT indexer_pages_uniqueurl UNIQUE (url )
);
I also have two indexes:
CREATE INDEX idx_indexer_pages_missingings
ON indexer.pages
USING btree
(missingings )
WHERE missingings > 0;
and
CREATE INDEX idx_indexer_pages_null
ON indexer.pages
USING btree
(recipeid )
WHERE NULL::boolean;
There are no triggers on this table, and there is one other table that has a FK constraint on Pages.PageId.