skip to content

SQL: A basic UPSERT in PostgreSQL

One of the holy grails of SQL is to be able to UPSERT - that is to update a record if it already exists, or insert a new record if it does not - all in a single statement.

This hasn't been possible in PostgreSQL in earlier versions, but can now be done in PostgreSQL 9.1 and higher.

The old way

Using plain SQL commands there are two approaches we can take:

Using SELECT as a test

if(SELECT record exists) { UPDATE record; } else { INSERT record; }

Using a trial UPDATE

if(UPDATE affects one or more records) { // we're done } else { INSERT record; }

While these methods are logical and easy to follow, the reason for moving to a single statement is to improve speed and reduce the chance of race conditions.

To avoid race conditions on a busy database, a LOCK TABLE command is required, which is even more reason to speed up the INSERT/UPDATE transaction.

Creating an UPSERT

The syntax here is bit mind-bending, but if we take it step by step it makes more sense.

Suppose we're keeping a tally of spiders visiting a website each day. If it's the first visit then we need to initialise the counter:

INSERT INTO spider_count (spider, tally) VALUES ('Googlebot', 1);

But on subsequent visits the tally just needs to be incremented:

UPDATE spider_count SET tally=tally+1 WHERE date='today' AND spider='Googlebot';

Looking back to the first section, this can be done simply, but how about making it an UPSERT?

We start by modifying the INSERT syntax to replace the VALUES section with a SELECT. This is perfectly valid SQL and doesn't affect the query in any way. The UPDATE remains unchanged.

We then put each statement into a variable as follows:

$insert = "INSERT INTO spider_count (spider, tally) SELECT 'Googlebot', 1"; $upsert = "UPDATE spider_count SET tally=tally+1 WHERE date='today' AND spider='Googlebot'";

The final upsert query then becomes:

WITH upsert AS ($upsert RETURNING *) $insert WHERE NOT EXISTS (SELECT * FROM upsert);

What this is saying essentially is "INSERT IFF !UPDATE".

The full query, if you expand the variables, will look something like the following, but it's much easier to wrap your head around if you use the variables approach:

WITH upsert AS (UPDATE spider_count SET tally=tally+1 WHERE date='today' AND spider='Googlebot' RETURNING *) INSERT INTO spider_count (spider, tally) SELECT 'Googlebot', 1 WHERE NOT EXISTS (SELECT * FROM upsert)

Putting it into practice

The following command line extract shows that our UPSERT is working:

postgres=# CREATE TABLE spider_count ( date date NOT NULL DEFAULT NOW(), spider varchar NOT NULL, tally int NOT NULL default 0 ); CREATE TABLE postgres=# \d spider_count Table "public.spider_count" Column | Type | Modifiers --------+-------------------+------------------------------------- date | date | not null default now() spider | character varying | not null tally | integer | not null default 0 postgres=# WITH upsert AS (UPDATE spider_count SET tally=tally+1 WHERE date='today' AND spider='Googlebot' RETURNING *) INSERT INTO spider_count (spider, tally) SELECT 'Googlebot', 1 WHERE NOT EXISTS (SELECT * FROM upsert); INSERT 0 1 postgres=# WITH upsert AS (UPDATE spider_count SET tally=tally+1 WHERE date='today' AND spider='Googlebot' RETURNING *) INSERT INTO spider_count (spider, tally) SELECT 'Googlebot', 1 WHERE NOT EXISTS (SELECT * FROM upsert); INSERT 0 0 postgres=# WITH upsert AS (UPDATE spider_count SET tally=tally+1 WHERE date='today' AND spider='Googlebot' RETURNING *) INSERT INTO spider_count (spider, tally) SELECT 'Googlebot', 1 WHERE NOT EXISTS (SELECT * FROM upsert); INSERT 0 0 postgres=# select * from spider_count; date | spider | tally ------------+-----------+------- 2024-11-21 | Googlebot | 3 (1 row)

In PostgreSQL before version 9.1 there are no Common Table Expressions (CTEs) so the basic approaches outlined in the first section, with a LOCK TABLE, are probably your best option, and only marginally slower.

Other databases such as MySQL have other means of achieving the same result, including the non-standard "INSERT IGNORE".

LOCKING to avoid race condition

To be safe, the affected table needs to be locked to prevent problems when multiple upserts overlap in time. Just because the UPSERT is a single statement doesn't mean it's instantaneous:

BEGIN; LOCK TABLE spider_count IN SHARE ROW EXCLUSIVE MODE; WITH upsert AS ($upsert RETURNING *) $insert WHERE NOT EXISTS (SELECT * FROM upsert); COMMIT;

Without the LOCK TABLE command you run the risk of duplicate records being created.

INSERT IF NOT EXISTS

If you're not worried about updating existing records, and only in inserting new ones where they don't already exist in the database, then you don't need to use an UPSERT:

BEGIN; LOCK TABLE mailing_list IN SHARE ROW EXCLUSIVE MODE; INSERT INTO mailing_list (email) SELECT 'email(at)example.org' WHERE NOT EXISTS ( SELECT * FROM mailing_list WHERE email='email(at)example.org' ); COMMIT;

References

< SQL

User Comments

Post your comment or question

15 June, 2016

PostgreSQL 9.5 has ON CONFLICT UPDATE. No more table locking required.

Interesting (from the change logs):

INSERT INTO user_logins (username, logins)
VALUES ('Naomi',1), ('James',1)
ON CONFLICT (username)
DO UPDATE SET logins = user_logins.logins + EXCLUDED.logins;

14 February, 2016

Nice Article !

Really this will help to people of PostgreSQL Community.
I have also prepared small demonstration on, write upsert/merge statement in PostgreSQL.

www.dbrnd.com/2015/10/postgresql-insert-update-or-upsert-merge-using-writable-cte/

13 May, 2014

Hi and Thanks for this howto. I'd be interested also to get some return value for the update and insert to confirm that the operation went well. adding "RETURNING my_field_name" at the end of the query does indeed only return something on insert. Fort the update I'm not too sure how to have it return anything since there is alreadt a RETURNING clause in the update part of the query.

11 February, 2014

While you acknowledge the race conditions here, I think you should stress more clearly to users that *the race will result in duplicate rows being inserted* if there's no primary key or unique constraint on the columns being used as the upsert key.

IMO, LOCKing should be the default in all examples, with a mention to suggest that if you have a unique constraint you can omit the lock and instead retry when you get duplicate key errors.

A link to depesz's "why is upsert so complicated" article for the nitty gritty wouldn't hurt either.

16 December, 2013

Thanks for this tutorial, helped me build an upsert for my own purposes.
I was attempting to enhance it so it could return values from both the insert and update, whichever was executed. So far only been able to get return values from the insert, but not update. Do you know if it is possible?

top