SQL: Selecting rows before and after
The introduction of WITH queries (Common Table Expressions or CTEs) in PostgreSQL gives us a chance to revisit and revise some past practices.
Using CTEs in combination with Window Functions allows us to better manipulate data at the SQL level rather than relying on PHP or other scripting languages.
Preparding our dataset
For the following discussion we're using this temporary data table in PostgreSQL:
CREATE TABLE tmp_events (
id serial,
date date not null,
name varchar not null,
PRIMARY KEY(id)
);
CREATE TABLE
With data imported using:
COPY tmp_events (date, name) FROM stdin;
2018-01-01 New Year's Day
2018-04-02 Easter Monday
2018-05-10 Ascension Day
2018-03-30 Good Friday
2018-05-21 Whit Monday
2018-05-27 Mother's Day
2018-06-17 Father's Day
2018-07-14 Bastille Day
2018-05-01 Labour Day
2018-08-15 Assumption Day
2018-11-01 All Saints Day
2018-11-11 Armistice Day
2018-12-26 St Stephens Day
2018-05-08 V-E Day
2018-12-25 Christmas Day
\.
COPY 15
Our goal is to extract a list of events before and after the highlighted "All Saints Day" event. Notice that we've shuffled the order of the events we're importing to make it more realistic.
Now obviously we can select all events ordered by date:
# SELECT * FROM tmp_events ORDER BY date;
id | date | name
----+------------+-----------------
16 | 2018-01-01 | New Year's Day
19 | 2018-03-30 | Good Friday
17 | 2018-04-02 | Easter Monday
24 | 2018-05-01 | Labour Day
29 | 2018-05-08 | V-E Day
18 | 2018-05-10 | Ascension Day
20 | 2018-05-21 | Whit Monday
21 | 2018-05-27 | Mother's Day
22 | 2018-06-17 | Father's Day
23 | 2018-07-14 | Bastille Day
25 | 2018-08-15 | Assumption Day
26 | 2018-11-01 | All Saints Day
27 | 2018-11-11 | Armistice Day
30 | 2018-12-25 | Christmas Day
28 | 2018-12-26 | St Stephens Day
(15 rows)
With this we can import the data into a PHP array, locate the 'current' event, and take a slice, but that means loading all the data into an array which gets more and more inefficient as the dataset grows.
What we'd actually like is a query to give us just the events either side of the 'current' event.
The old approach
Without CTAs we need two separate queries - one to look backwards in time and one to look forwards:
<?PHP
$current_id = 26;
$current_date = '2018-11-01';
// select previous two events
SELECT *
FROM tmp_events
WHERE (
date <= '$current_date'
AND id != '$current_id'
)
ORDER BY date DESC, name DESC
LIMIT 2
// select next two events
SELECT *
FROM tmp_events
WHERE (
date >= '$current_date'
AND id != '$current_id'
)
ORDER BY date, name
LIMIT 2
?>
So we're making two queries, which each return two events, and then combining and displaying them on the page. Workable, but messy.
The CTE approach
The first step is to sort our events and assign a rank/number to each:
# SELECT id, date, name, row_number() OVER (ORDER BY date)
FROM tmp_events;
id | date | name | row_number
----+------------+-----------------+------------
16 | 2018-01-01 | New Year's Day | 1
19 | 2018-03-30 | Good Friday | 2
17 | 2018-04-02 | Easter Monday | 3
24 | 2018-05-01 | Labour Day | 4
29 | 2018-05-08 | V-E Day | 5
18 | 2018-05-10 | Ascension Day | 6
20 | 2018-05-21 | Whit Monday | 7
21 | 2018-05-27 | Mother's Day | 8
22 | 2018-06-17 | Father's Day | 9
23 | 2018-07-14 | Bastille Day | 10
25 | 2018-08-15 | Assumption Day | 11
26 | 2018-11-01 | All Saints Day | 12
27 | 2018-11-11 | Armistice Day | 13
30 | 2018-12-25 | Christmas Day | 14
28 | 2018-12-26 | St Stephens Day | 15
(15 rows)
The new row_number column is now in date order, making it easier to identify the rows we want to extract. But we only have the id value, which is not entirely helpful. We need another (sub-)query to convert our id value into a row number:
SELECT row_number
FROM (output of previous query)
WHERE id = '$current_id';
row_number
------------
12
(1 row)
The final step is to select the rows within ±2 rows of the current event - in this case rows 10 to 14:
SELECT *
FROM (first query), (second query)
WHERE ABS((first query).row_number - (second query).row_number) <= 2
ORDER BY date;
Now if you're not familiar with common table expressions, you might be confused at this point as to how this all comes together. Actually it's not that difficult. What we end up with is:
WITH cte AS (
SELECT id, date, name, row_number() OVER (ORDER BY date)
FROM tmp_events
), current AS (
SELECT row_number
FROM cte
WHERE id = '$current_id'
)
SELECT cte.*
FROM cte, current
WHERE ABS(cte.row_number - current.row_number) <= 2
ORDER BY cte.row_number;
The WITH statements create tempory in-memory tables with the results of the contained query for later reference. In this case the virtual tables cte and current.
Which gives us exactly the result we want:
id | date | name | row_number
----+------------+-----------------+------------
23 | 2018-07-14 | Bastille Day | 10
25 | 2018-08-15 | Assumption Day | 11
26 | 2018-11-01 | All Saints Day | 12
27 | 2018-11-11 | Armistice Day | 13
30 | 2018-12-25 | Christmas Day | 14
(5 rows)
Not only have we reduced our code to a single query, we also now only need the id of the current events, and no longer the date as all sorting is done in the SQL.
Related Articles - Window functions
- SQL Selecting n rows from each category
- SQL Selecting rows before and after