PostgreSQL offsets and pagination performance
PostgreSQL is notoriously bad at performing well with high offsets, mainly because it needs to do a full table scan and count all the rows up to the offset. This can be a problem is you want to paginate over a full table without holding all the rows in memory.
On the surface of it PostgreSQL Cursors may seem like a good solution to this problem--rather than executing a whole query at once a cursor encapsulates the query allowing you to read a few rows at a time. However in practice cursors use a transaction behind the scenes and you'll quickly run out of memory.
A good solution is to use an indexed auto-incrementing ID on the table and then paginate based on that.
At Clearbit, we use a UUIDs for identifiers rather than integers, but we have an internal
iid
column which is a indexed numeric sequence. If you're not using the default integer IDs than you'll need to do something similar and add a column.
The example below is using Sequel, but can be easily ported to your ORM of choice. We're essentially paging over the ordered records using the iid
column in a WHERE
clause instead of an OFFSET
.
def batch_page(page_size = 2_000)
last_record = order(:iid.asc).first
last_iid = last_record && last_record.iid || 0
last_iid -= 1
loop do
result = order(:iid.asc)
.where { iid > last_iid }
.limit(page_size)
.all
break if result.empty?
yield result
last_iid = result.last.iid
end
self
end
The generated SQL query ends up looking like this--it's very fast to run since the iid
column is indexed.
SELECT * FROM "people" WHERE ("iid" > 2006) ORDER BY "iid" ASC LIMIT 2000