Ordering By
These days I had an interesting ORDER BY case at work, which is worth analyzing because it really helps us understanding how hard the life of a database is when talking about query planning. Let’s use the orders table we used in some previous posts, with 10,000,000rows and a still unindexed date column.
The Basics
Let’s start with a simple example to explain something very basic but annoying detail we have about sorting: by looking at the queries SELECT * FROM orders ORDER BY date DESC and SELECT * FROM orders ORDER BY date DESC LIMIT 1, it may sound that the second query is way cheaper than the first one, but here is the thing: the database will have to full scan the entire table for both queries anyway, which stands for a good chunk of the cost. “Wait… how?! The second query needs just a single row!!”-one could argue- but how can the database know which is the latest date value without actually traversing all rows ? It can´t just stop checking at row number 5,000,000, what if row 5,000,001 is the one with the latest date? No, it can just stop at row 9,999,999, cause row 10,000,000 can be the latest one also. Remember: rows in Postgres are ordered by insertion order and the database has no clue nor control over the order of dates that are being inserted in your database.
The sorting phase of these queries will be different for the LIMIT 1 case Postgres needs to keep in memory just a single row (Top-N heapsort), while the first one will use a lot of memory because all rows need to be in memory.
The plans for both queries are bellow. Note that Seq Scan on orders is available in both, but the first query (without the LIMIT) are merging on disk, while the second one is using only 25kb of memory.
1
2
3
4
5
6
7
8
9
10
11
EXPLAIN ANALYZE SELECT * FROM orders ORDER BY date DESC;
Gather Merge (cost=940031.36..1912321.54 rows=8333334 width=45) (actual time=886.772..1907.074 rows=10000000 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=939031.34..949448.00 rows=4166667 width=45) (actual time=859.775..1047.260 rows=3333333 loops=3)
Sort Key: order_date DESC
Sort Method: external merge Disk: 189032kB
Worker 0: Sort Method: external merge Disk: 181792kB
Worker 1: Sort Method: external merge Disk: 183176kB
-> Parallel Seq Scan on orders (cost=0.00..224542.67 rows=4166667 width=45) (actual time=0.019..203.879 rows=3333333 loops=3)
1
2
3
4
5
6
7
8
9
10
11
12
EXPLAIN ANALYZE SELECT * FROM orders ORDER BY date DESC LIMIT 1;
Limit (cost=246376.03..246376.14 rows=1 width=45) (actual time=340.960..343.997 rows=1 loops=1)
-> Gather Merge (cost=246376.03..1218666.21 rows=8333334 width=45) (actual time=340.218..343.253 rows=1 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=245376.00..255792.67 rows=4166667 width=45) (actual time=331.016..331.017 rows=1 loops=3)
Sort Key: order_date DESC
Sort Method: top-N heapsort Memory: 25kB
Worker 0: Sort Method: top-N heapsort Memory: 25kB
Worker 1: Sort Method: top-N heapsort Memory: 25kB
-> Parallel Seq Scan on orders (cost=0.00..224542.67 rows=4166667 width=45) (actual time=0.016..207.089 rows=3333333 loops=3)
Index 1
I asked ChatGPT to change my script including a date column and fill it with data spreading 3 years, he generated something like DATE '2020-01-01' + (trunc(random() * 1095)::INT) and with an index CREATE INDEX idx_order_date_desc ON orders(order_date DESC); this is what we got for the query without the limit:
1
2
3
Index Scan using idx_order_date_desc on orders (cost=0.43..948823.32 rows=9942744 width=45) (actual time=0.048..15908.590 rows=10000000 loops=1)
Planning Time: 0.074 ms
Execution Time: 16144.808 ms
For the query with limit 1, the plan is pretty obvious: the database will use the index because it is ordered, so it will find the latest date basically in the very first entry in the index, but for the query above, the chosen plan actually surprised me a little bit because although we know the entries in the index are all already sorted in the index, so there will be no sorting execution phase, the number of random reads will be very high because we have 10,000,000 rows in the table. To see the other option, I executed SET enable_indexscan = off; and here is the plan:
1
2
3
4
5
6
7
8
9
10
11
Gather Merge (cost=940029.36..1912319.54 rows=8333334 width=45) (actual time=853.661..1943.672 rows=10000000 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=939029.34..949446.00 rows=4166667 width=45) (actual time=844.929..1051.020 rows=3333333 loops=3)
Sort Key: order_date DESC
Sort Method: external merge Disk: 183976kB
Worker 0: Sort Method: external merge Disk: 184360kB
Worker 1: Sort Method: external merge Disk: 185640kB
-> Parallel Seq Scan on orders (cost=0.00..224540.67 rows=4166667 width=45) (actual time=54.647..214.788 rows=3333333 loops=3)
Planning Time: 0.078 ms
Execution Time: 2124.597 ms
With Index Scan disabled, Postgres did the Seq Scan + Sort again, but compare the Execution Time of both queries: the query without the index was 8 times faster. It seems Postgres didn´t take the correct decision here, mainly because in the script generated by ChatGPT for me, the dates are completely random in relation to insertion order, making this Index Scan really expensive. Let’s change this to prove this point.
1
2
3
4
SELECT attname, correlation
FROM pg_stats
WHERE tablename = 'orders' AND attname = 'order_date';
-- order_date 0.0020382402
Index 2
Let’s change the script to DATE '2020-01-01' + ((row_number() OVER ()) / (10000000.0 / 1095))::int, so that the correlation between order_date and rows will basically match 100%, and run the query again:
1
2
3
Index Scan using idx_order_date_desc on orders (cost=0.43..400164.39 rows=9952330 width=45) (actual time=1.220..1284.343 rows=10000000 loops=1)
Planning Time: 0.483 ms
Execution Time: 1432.031 ms
And disabling the Index Scan:
1
2
3
4
5
6
7
8
9
10
11
Gather Merge (cost=936284.81..1903939.97 rows=8293608 width=45) (actual time=620.550..1588.312 rows=10000000 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=935284.79..945651.80 rows=4146804 width=45) (actual time=613.315..792.892 rows=3333333 loops=3)
Sort Key: order_date DESC
Sort Method: external merge Disk: 186552kB
Worker 0: Sort Method: external merge Disk: 185704kB
Worker 1: Sort Method: external merge Disk: 181736kB
-> Parallel Seq Scan on orders (cost=0.00..224348.04 rows=4146804 width=45) (actual time=58.366..215.374 rows=3333333 loops=3)
Planning Time: 0.114 ms
Execution Time: 1765.833 ms
OK, now we can see that Postgres made the right decision. In both examples it has the correct information (correlation of the date column), so I really don´t know to explain why he decided for the Index Scan in the first case, but everything has a good side: it`s a way to show you that databases are not invincible. Let’s move on.
Order By and Filter
Let’s add a store_id to the table. We have 50 stores, but 90% of the rows are split between 3 stores (store ids 1, 2 and 3), we have 10 new stores with less than 100 sales (ids 4 to 13), and the rest are splitted equally between the remaining 47 stores.
For now, we have 2 indexes Postgres can use to solve the query: one in order_date and one in store_id. The Seq Scan off course is always an option to solve any hquery. The target query we will play in this section will basically filter by one or more stores and ORDER BY order_date DESC with a LIMIT 50. Before actually seeing what Postgres decided for each query, let’s stop and think how it could use each index to solve the query:
store_id: with this index, the database will easily find all the rows for the matching stores, but then it will have to sort all rows to get the only thetop 50. The number of rows a store has will impact both the heap fetch and the sorting algorithm.order_date: with this index, the database will easily find the latest rows (in relation to order date), but it will have to filter the rows to match only specified stores. This is interesting, because this is a way of executing the query that will not execute any sorting, but it depends on how fast the database will find the rows for the matching stores. It’s easy to understand that if it’s a very ‘successful store’ (lots of orders), it will find the first 50 rows very fast, while if it’s a deactivated store, with no recent orders, the database will traverse a lot of rows before finding the first orders.
Let’s see if our reasoning makes sense by first executing the query by a store with only 100 orders:
1
2
3
4
5
6
7
8
9
10
EXPLAIN ANALYZE SELECT * FROM orders WHERE store_id = 10 ORDER BY order_date DESC LIMIT 50;
-- Limit (cost=8.46..8.47 rows=1 width=49) (actual time=0.057..0.062 rows=50 loops=1)
-- -> Sort (cost=8.46..8.47 rows=1 width=49) (actual time=0.056..0.058 rows=50 loops=1)
-- Sort Key: order_date DESC
-- Sort Method: quicksort Memory: 34kB
-- -> Index Scan using idx_orders_store_id on orders (cost=0.43..8.45 rows=1 width=49) (actual time=0.031..0.040 rows=99 loops=1)
-- Index Cond: (store_id = 10)
-- Planning Time: 0.217 ms
-- Execution Time: 0.079 ms
OK, it was as expected: this store has just a few rows, so finding all its rows with the idx_orders_store_id index and sorting them seems easy.
Now, what happens when we filter by a big store (30% of the orders):
1
2
3
4
5
6
7
EXPLAIN ANALYZE SELECT * FROM orders WHERE store_id = 1 ORDER BY order_date DESC LIMIT 50;
-- Limit (cost=0.43..20.91 rows=50 width=49) (actual time=0.033..0.127 rows=50 loops=1)
-- -> Index Scan using idx_order_date_desc on orders (cost=0.43..1224704.39 rows=2991333 width=49) (actual time=0.031..0.122 rows=50 loops=1)
-- Filter: (store_id = 1)
-- Rows Removed by Filter: 86
-- Planning Time: 0.152 ms
-- Execution Time: 0.152 ms
Nice! As expected, the database decided to go through idx_order_date_desc. Note the absence of a Sort phase, because the index being used is already ordered. Also note the Rows Removed by Filter: 86: this is how rows the database discarded because they were not from store_id=1. A small number, given this store has a lot of orders.
Just to play with this, I’ll temporary DROP the index on store_id and play a little bit.
For the small store, the database to go with a Seq Scan, because it would have to traverse so many rows searching by store_id = 10 that it’s not worth the cost:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
EXPLAIN ANALYZE SELECT * FROM orders WHERE store_id = 10 ORDER BY order_date DESC LIMIT 50;
-- Limit (cost=288029.44..288029.45 rows=1 width=49) (actual time=264.799..268.921 rows=50 loops=1)
-- -> Sort (cost=288029.44..288029.45 rows=1 width=49) (actual time=262.628..266.747 rows=50 loops=1)
-- Sort Key: order_date DESC
-- Sort Method: quicksort Memory: 34kB
-- -> Gather (cost=1000.00..288029.43 rows=1 width=49) (actual time=262.507..266.718 rows=99 loops=1)
-- Workers Planned: 2
-- Workers Launched: 2
-- -> Parallel Seq Scan on orders (cost=0.00..287029.33 rows=1 width=49) (actual time=193.603..249.604 rows=33 loops=3)
-- Filter: (store_id = 10)
-- Rows Removed by Filter: 3333300
-- Planning Time: 0.147 ms
-- Execution Time: 269.190 ms
While for a “medium store” (26805 orders), the database decided to go through the idx_order_date_desc and had to discard 21942 until getting to the 50 needed rows.
1
2
3
4
5
6
7
EXPLAIN ANALYZE SELECT * FROM orders WHERE store_id = 20 ORDER BY order_date DESC LIMIT 50;
-- Limit (cost=0.43..2355.63 rows=50 width=49) (actual time=8.576..56.375 rows=50 loops=1)
-- -> Index Scan using idx_order_date_desc on orders (cost=0.43..1224704.39 rows=26000 width=49) (actual time=8.574..56.364 rows=50 loops=1)
-- Filter: (store_id = 20)
-- Rows Removed by Filter: 21942
-- Planning Time: 0.072 ms
-- Execution Time: 56.396 ms
Composite Index
Now, let’s create a much better index and play a little bit:
1
CREATE INDEX idx_orders_store_id_order_date_desc ON orders(store_id, order_date DESC);
This index is perfect for the queries we executed so far, because for each store the database has all it’s rows ordered by date, so querying by the big store now looks like (note there is not sorting phase):
1
2
3
4
5
6
7
EXPLAIN ANALYZE SELECT * FROM orders WHERE store_id = 1 ORDER BY order_date DESC LIMIT 50;
-- Limit (cost=0.43..16.72 rows=50 width=49) (actual time=0.021..0.063 rows=50 loops=1)
-- -> Index Scan using idx_orders_store_id_order_date_desc on orders (cost=0.43..974129.68 rows=2991333 width=49) (actual time=0.020..0.058 rows=50 loops=1)
-- Index Cond: (store_id = 1)
-- Planning Time: 0.221 ms
-- Execution Time: 0.079 ms
Let’s stop and think what would happen if we filtered by 2 (or more) stores. In this case, the database will easily find the rows for each store, but the fact they are ordered for each store actually doesn’t matter, because the top 50 between both stores, so a sorting between all matched rows will still be needed. Let’s see in practice what happens:
1
2
3
4
5
6
7
8
EXPLAIN ANALYZE SELECT * FROM orders WHERE store_id IN (20, 21) ORDER BY order_date DESC LIMIT 50;
-- Limit (cost=0.43..1021.02 rows=50 width=49) (actual time=3.335..9.500 rows=50 loops=1)
-- -> Index Scan using idx_order_date_desc on orders (cost=0.43..1224704.39 rows=60000 width=49) (actual time=3.333..9.492 rows=50 loops=1)
-- Filter: (store_id = ANY ('{20,21}'::integer[]))
-- Rows Removed by Filter: 12655
-- Planning Time: 0.162 ms
-- Execution Time: 9.532 ms
As expected, the database decided to not use our new index, and it is using the idx_order_date_desc back again. This is the tricky thing about ordering.