SQL: A basic UPSERT in PostgreSQL
One of the holy grails of SQL is to be able to UPSERT - that is to update a record if it already exists, or insert a new record if it does not - all in a single statement.
This hasn't been possible in PostgreSQL in earlier versions, but can now be done in PostgreSQL 9.1 and higher.
The old way
Using plain SQL commands there are two approaches we can take:
Using SELECT as a test
if(SELECT record exists) {
UPDATE record;
} else {
INSERT record;
}
Using a trial UPDATE
if(UPDATE affects one or more records) {
// we're done
} else {
INSERT record;
}
While these methods are logical and easy to follow, the reason for moving to a single statement is to improve speed and reduce the chance of race conditions.
To avoid race conditions on a busy database, a LOCK TABLE command is required, which is even more reason to speed up the INSERT/UPDATE transaction.
Creating an UPSERT
The syntax here is bit mind-bending, but if we take it step by step it makes more sense.
Suppose we're keeping a tally of spiders visiting a website each day. If it's the first visit then we need to initialise the counter:
INSERT INTO spider_count (spider, tally) VALUES ('Googlebot', 1);
But on subsequent visits the tally just needs to be incremented:
UPDATE spider_count SET tally=tally+1 WHERE date='today' AND spider='Googlebot';
Looking back to the first section, this can be done simply, but how about making it an UPSERT?
We start by modifying the INSERT syntax to replace the VALUES section with a SELECT. This is perfectly valid SQL and doesn't affect the query in any way. The UPDATE remains unchanged.
We then put each statement into a variable as follows:
$insert = "INSERT INTO spider_count (spider, tally) SELECT 'Googlebot', 1";
$upsert = "UPDATE spider_count SET tally=tally+1 WHERE date='today' AND spider='Googlebot'";
The final upsert query then becomes:
WITH upsert AS ($upsert RETURNING *) $insert WHERE NOT EXISTS (SELECT * FROM upsert);
What this is saying essentially is "INSERT IFF !UPDATE".
The full query, if you expand the variables, will look something like the following, but it's much easier to wrap your head around if you use the variables approach:
WITH upsert AS (UPDATE spider_count SET tally=tally+1 WHERE date='today' AND spider='Googlebot' RETURNING *)
INSERT INTO spider_count (spider, tally) SELECT 'Googlebot', 1 WHERE NOT EXISTS (SELECT * FROM upsert)
Putting it into practice
The following command line extract shows that our UPSERT is working:
postgres=# CREATE TABLE spider_count (
date date NOT NULL DEFAULT NOW(),
spider varchar NOT NULL,
tally int NOT NULL default 0
);
CREATE TABLE
postgres=# \d spider_count
Table "public.spider_count"
Column | Type | Modifiers
--------+-------------------+-------------------------------------
date | date | not null default now()
spider | character varying | not null
tally | integer | not null default 0
postgres=# WITH upsert AS (UPDATE spider_count SET tally=tally+1 WHERE date='today' AND spider='Googlebot' RETURNING *) INSERT INTO spider_count (spider, tally) SELECT 'Googlebot', 1 WHERE NOT EXISTS (SELECT * FROM upsert);
INSERT 0 1
postgres=# WITH upsert AS (UPDATE spider_count SET tally=tally+1 WHERE date='today' AND spider='Googlebot' RETURNING *) INSERT INTO spider_count (spider, tally) SELECT 'Googlebot', 1 WHERE NOT EXISTS (SELECT * FROM upsert);
INSERT 0 0
postgres=# WITH upsert AS (UPDATE spider_count SET tally=tally+1 WHERE date='today' AND spider='Googlebot' RETURNING *) INSERT INTO spider_count (spider, tally) SELECT 'Googlebot', 1 WHERE NOT EXISTS (SELECT * FROM upsert);
INSERT 0 0
postgres=# select * from spider_count;
date | spider | tally
------------+-----------+-------
2024-11-21 | Googlebot | 3
(1 row)
In PostgreSQL before version 9.1 there are no Common Table Expressions (CTEs) so the basic approaches outlined in the first section, with a LOCK TABLE, are probably your best option, and only marginally slower.
Other databases such as MySQL have other means of achieving the same result, including the non-standard "INSERT IGNORE".
LOCKING to avoid race condition
To be safe, the affected table needs to be locked to prevent problems when multiple upserts overlap in time. Just because the UPSERT is a single statement doesn't mean it's instantaneous:
BEGIN;
LOCK TABLE spider_count IN SHARE ROW EXCLUSIVE MODE;
WITH upsert AS ($upsert RETURNING *) $insert WHERE NOT EXISTS (SELECT * FROM upsert);
COMMIT;
Without the LOCK TABLE command you run the risk of duplicate records being created.
INSERT IF NOT EXISTS
If you're not worried about updating existing records, and only in inserting new ones where they don't already exist in the database, then you don't need to use an UPSERT:
BEGIN;
LOCK TABLE mailing_list IN SHARE ROW EXCLUSIVE MODE;
INSERT INTO mailing_list (email)
SELECT 'email(at)example.org'
WHERE NOT EXISTS (
SELECT * FROM mailing_list WHERE email='email(at)example.org'
);
COMMIT;
References
- Upserting via Writeable CTE
- Stack Overflow: Insert, on duplicate update (postgresql)
- PostgreSQL: PostgreSQL 9.1: WITH Queries (Common Table Expressions)
Brian Minton 15 June, 2016
PostgreSQL 9.5 has ON CONFLICT UPDATE. No more table locking required.
Interesting (from the change logs):
INSERT INTO user_logins (username, logins)
VALUES ('Naomi',1), ('James',1)
ON CONFLICT (username)
DO UPDATE SET logins = user_logins.logins + EXCLUDED.logins;
anvesh 14 February, 2016
Nice Article !
Really this will help to people of PostgreSQL Community.
I have also prepared small demonstration on, write upsert/merge statement in PostgreSQL.
www.dbrnd.com/2015/10/postgresql-insert-update-or-upsert-merge-using-writable-cte/
julien 13 May, 2014
Hi and Thanks for this howto. I'd be interested also to get some return value for the update and insert to confirm that the operation went well. adding "RETURNING my_field_name" at the end of the query does indeed only return something on insert. Fort the update I'm not too sure how to have it return anything since there is alreadt a RETURNING clause in the update part of the query.
Craig Ringer 11 February, 2014
While you acknowledge the race conditions here, I think you should stress more clearly to users that *the race will result in duplicate rows being inserted* if there's no primary key or unique constraint on the columns being used as the upsert key.
IMO, LOCKing should be the default in all examples, with a mention to suggest that if you have a unique constraint you can omit the lock and instead retry when you get duplicate key errors.
A link to depesz's "why is upsert so complicated" article for the nitty gritty wouldn't hurt either.
Mark 16 December, 2013
Thanks for this tutorial, helped me build an upsert for my own purposes.
I was attempting to enhance it so it could return values from both the insert and update, whichever was executed. So far only been able to get return values from the insert, but not update. Do you know if it is possible?