SQL: Counting Page Views in PostgreSQL
This is an example of a seemingly simple task that becomes more complicated as if fails to adapt to increases in database size and traffic.
Other solutions such as using asynchoronous JavaScript or logfile analysis could be used as well or instead of those presented here. The examples below are specific to SQL.
The simple approach
The initial goal was to track the number of times each article on a website had been seen and to display those values to the site owner. This is in aditional to any statistics or analytics reports.
The obvious approach is to just update a database value every time the page is displayed:
<?PHP
pg_query("UPDATE articles SET counter=counter+1 WHERE id = '{$id}'");
?>
The first issue is that an UPDATE can be slow if there are a lot of concurrent requests. To avoid a delay in loading the page we can push this to the bottom of the page, and first flush the output buffer:
<?PHP
flush();
pg_query("UPDATE articles SET counter=counter+1 WHERE id = '{$id}'");
?>
This prevents the browser seeing any delays, but is still pretty slow behind the scenes, most obviously during peak traffic.
A faster solution
Instead of updating the main articles table on every view, we created a new table just to record views:
CREATE TABLE article_views (
articleid integer NOT NULL REFERENCES articles(dboid) ON DELETE CASCADE,
"timestamp" timestamp DEFAULT CURRENT_TIMESTAMP NOT NULL
);
The timestamp field is actually not needed here, but shouldn't have major speed implications and is useful in debugging.
Now with every article view we can use INSERT into the new table:
<?PHP
flush();
pg_query("INSERT INTO article_views (articleid) VALUES ('{$id}')");
?>
An INSERT into an otherwise inactive TABLE is much faster than the previous UPDATE, and we could use SELECT DISTINCT from this table to calculate total numbers.
This by itself is not sustainable, however, as our new table would continue to grow without bounds unless we take action to aggregate the data.
Aggregating the data
The trick is to run a query that aggregates the views and uses the result to update the original counters:
BEGIN;
UPDATE articles a SET counter = a.counter + subquery.count FROM (
SELECT DISTINCT articleid, COUNT(*) AS count FROM article_views GROUP BY articleid
) AS subquery WHERE a.id = subquery.articleid;
DELETE FROM article_views;
COMMIT;
The subquery component will return something like the following:
articleid | count
-----------+-------
451 | 1
79 | 12
451 | 11
83 | 2
79 | 31
988 | 23
158 | 30
155 | 52
The surrounding SQL increments the counter for each article based on these values, and then deletes the processed view records.
Using Common Table Expressions
Using a CTE (Common Table Expression) we can combine the two queries and eliminate the potential race condition of view records being deleted before having been counted.
This is now actually a single SQL statement so can be run without a transaction wrapper:
WITH deleted_rows AS (
DELETE FROM article_views RETURNING *
), subquery AS (
SELECT DISTINCT articleid, COUNT(*) AS count
FROM deleted_rows
GROUP BY articleid
)
UPDATE articles a
SET counter = a.counter + subquery.count
FROM subquery
WHERE a.id = subquery.articleid;
The rows to be deleted (deleted_rows) are tallied (subquery) and added to the existing counter values just like before.
This aggregator query can be: run at regular intervals, triggered manually, or triggered automatically only when the numbers need to be displayed.
References
- Introduction to Common Table Expression(CTE) in PostgreSQL
- PostgreSQL: PostgreSQL 9.1: WITH Queries (Common Table Expressions)