B
EXPLAIN ANALYZE suggested the cost of the whole query with lateral joins would be about a quarter of the original.
EXPLAIN (estimating the cost) suggests 40,089.36 vs. 189,883.92 unicorn points.
But EXPLAIN ANALYZE (measuring actual execution times) disagrees and shows 2,502.031 ms vs. 1,835.193 ms, so around 1/3 slower. There can be many reasons why the estimation is off target. Most prominently cost settings and statistics. See:
Keep PostgreSQL from sometimes choosing a bad query plan
That said, the query can probably be much faster (by orders of magnitude). We only got limited information, but I suppose you want something like this:
SELECT a.*
, rs.created_at AS latestsnapshottime -- maybe use COALESCE?
, rp.created_at AS latestresponsetime
FROM appointments a
LEFT JOIN LATERAL (
SELECT rs.created_at
FROM reportsnapshots rs
WHERE rs.appointmentid = a.id
ORDER BY rs.created_at DESC -- NULLS LAST ?
LIMIT 1
) rs ON true
LEFT JOIN LATERAL (
SELECT rp.created_at
FROM responses rp
WHERE rp.appointmentid = a.id
ORDER BY rp.created_at DESC -- NULLS LAST ?
LIMIT 1
) rp ON true
WHERE a.organizationid = 16
AND a.locationid = '51'
AND a.cancelled = FALSE
AND a.filteredin = TRUE
AND a.start BETWEEN '2021-05-04T00:00:00-06:00' AND '2021-05-04T23:59:59-06:00'
AND a.locationid IN (61,60,140,53,138,130,133,131,55,51,100)
-- GROUP BY a.id, rs.created_at, rp.created_at -- not needed, I guess
ORDER BY a.start, a.id
LIMIT 100;
DISTINCT ON is a great tool, but for different situations. See:
Select first row in each GROUP BY group?
Retrieving a single row from each LATERAL subquery like I suggest can use an index and is super fast. Ideally, you have theses indexes:
reportsnapshots (appointmentid, created_at DESC NULLS LAST)
responses (appointmentid, created_at DESC NULLS LAST)
Further reading:
Optimize GROUP BY query to retrieve latest row per user
If created_at is defined NOT NULL, a simpler index on (appointmentid, created_at) is just as good (and preferable). See:
Sort by column ASC, but NULL values first?
Why does ORDER BY NULLS LAST affect the query plan on a primary key?
Plus an index on the outer table. The one in use now (appointments_organizationId_status_start_idx) doesn't seem too bad. But I suspect more potential, depending on undisclosed information.
You probably don't have to aggregate in the outer query, since both subqueries return a single row each (even your original).
Alternatively, use plain correlated subqueries with max() for your simple case. Probably even faster, yet:
SELECT a.*
, (SELECT max(rs.created_at)
FROM reportsnapshots rs
WHERE rs.appointmentid = a.id) AS latestsnapshottime
, (SELECT max(rp.created_at)
FROM responses rp
WHERE rp.appointmentid = a.id) AS latestresponsetime
FROM appointments a
WHERE ...
LIMIT 100;
See:
What is the difference between LATERAL JOIN and a subquery in PostgreSQL?
Aside 1: BETWEEN is typically no good for timestamps. See:
https://wiki.postgresql.org/wiki/Don't_Do_This#Don.27t_use_BETWEEN_.28especially_with_timestamps.29
How to add a day/night indicator to a timestamp column?
Aside 2: In your first query plan I see Sort Method: external merge Disk: 3,856kB and Sort Method: external merge Disk: 17,752kB, which indicates a lack of work_mem. The same problem does not surface in the 2nd plan, nor will it for my queries. But look into your server configuration. Related:
Optimize simple query using ORDER BY date and text