Postgresql 10: Bitmap Heap Scan with exact heap block



  • I have the following query:

    select ro.*
    from courier c1
        join courier c2 on c2.real_physical_courier_1c_id = c1.real_physical_courier_1c_id
        join restaurant_order ro on ro.courier_id = c2.id
        left join jsonb_array_elements(items) jae on true
        left join jsonb_array_elements(jae->'options') ji on true
        inner join catalogue c on c.id in ((jae->'id')::int, (ji->'id')::int)
        join restaurant r on r.id = ro.restaurant_id
    where c1.id = '7b35cdab-b423-472a-bde1-d6699f6cefd3' and ro.status in (70, 73)
    group by ro.order_id, r.id ;
    

    Here is a part of a query plan that takes ~95% of the time:

    ->  Parallel Bitmap Heap Scan on restaurant_order ro  (cost=23.87..2357.58 rows=1244 width=1257) (actual time=11.931..38.163 rows=98 loops=2)"
          Recheck Cond: (status = ANY ('{70,73}'::integer[]))"
          Heap Blocks: exact=28755"
          ->  Bitmap Index Scan on ro__status  (cost=0.00..23.34 rows=2115 width=0) (actual time=9.168..9.168 rows=51540 loops=1)"
                Index Cond: (status = ANY ('{70,73}'::integer[]))"
    

    I have a couple of questions.

    1. First, a Bitmap index scan section. Postgres walks through 51540 ro__status index records based on Index Cond: (status = ANY ('{70,73}'::integer[]))" and creates a bitmap with 28755 elements. Its keys are physical location of corresponding table rows (indicated by exact in Heap Blocks section). Is this correct?
    2. Second, this map is passed to Bitmap Heap Scan phase. Recheck Cond is not actually carried out, since heap blocks are not lossy style. Bitmap Heap Scan sorts a bitmap by tuples' physical location to enable a sequential access. Then it reads table data sequentially in two goes (loops=2) and gets no more then 196 table rows. Is that correct?
    3. The bitmap size reflected in Heap Blocks: exact=28755 line varies greatly over the time. The difference is two orders of magnitude. For example, yesterday it was about 500. Why is that so?
    4. Now, why a bitmap created during Bitmap index scan phase has so many keys? There is ro__status index that can indicate that there are only about 200 records with status 70 and 73. I can't think of any reasons preventing postgres from keeping only those keys that actually satisfy an index cond. Overhead seems to be huge: instead of ~200 keys, there are 28755!
    5. Why does Bitmap Heap Scan take so long? As far as I can see there are two sequential reads (loops=2), it should take way less time, no? Or, is bitmap sorting by tuples' physical location the main culprit?
    6. Should I be worried about poor estimates? If so, increasing default_statistics_target should help, right? Now it's default 100.

    Just in case, here is an entire plan:

    "Group  (cost=51297.15..52767.65 rows=19998 width=1261) (actual time=42.555..42.555 rows=0 loops=1)"
    "  Group Key: ro.order_id, r.id"
    "  ->  Gather Merge  (cost=51297.15..52708.83 rows=11764 width=1261) (actual time=42.554..45.459 rows=0 loops=1)"
    "        Workers Planned: 1"
    "        Workers Launched: 1"
    "        ->  Group  (cost=50297.14..50385.37 rows=11764 width=1261) (actual time=38.850..38.850 rows=0 loops=2)"
    "              Group Key: ro.order_id, r.id"
    "              ->  Sort  (cost=50297.14..50326.55 rows=11764 width=1261) (actual time=38.850..38.850 rows=0 loops=2)"
    "                    Sort Key: ro.order_id, r.id"
    "                    Sort Method: quicksort  Memory: 25kB"
    "                    Worker 0:  Sort Method: quicksort  Memory: 25kB"
    "                    ->  Nested Loop  (cost=31.84..45709.27 rows=11764 width=1261) (actual time=38.819..38.819 rows=0 loops=2)"
    "                          ->  Nested Loop Left Join  (cost=27.21..5194.50 rows=5882 width=1325) (actual time=38.819..38.819 rows=0 loops=2)"
    "                                ->  Nested Loop Left Join  (cost=27.20..5076.49 rows=59 width=1293) (actual time=38.818..38.818 rows=0 loops=2)"
    "                                      ->  Nested Loop  (cost=27.20..5074.49 rows=1 width=1261) (actual time=38.818..38.818 rows=0 loops=2)"
    "                                            ->  Hash Join  (cost=26.93..5073.59 rows=1 width=1257) (actual time=38.817..38.818 rows=0 loops=2)"
    "                                                  Hash Cond: (c2.real_physical_courier_1c_id = c1.real_physical_courier_1c_id)"
    "                                                  ->  Nested Loop  (cost=24.28..5068.22 rows=1038 width=1267) (actual time=11.960..38.732 rows=98 loops=2)"
    "                                                        ->  Parallel Bitmap Heap Scan on restaurant_order ro  (cost=23.87..2357.58 rows=1244 width=1257) (actual time=11.931..38.163 rows=98 loops=2)"
    "                                                              Recheck Cond: (status = ANY ('{70,73}'::integer[]))"
    "                                                              Heap Blocks: exact=28755"
    "                                                              ->  Bitmap Index Scan on ro__status  (cost=0.00..23.34 rows=2115 width=0) (actual time=9.168..9.168 rows=51540 loops=1)"
    "                                                                    Index Cond: (status = ANY ('{70,73}'::integer[]))"
    "                                                        ->  Index Scan using courier_pkey on courier c2  (cost=0.41..2.18 rows=1 width=26) (actual time=0.005..0.005 rows=1 loops=195)"
    "                                                              Index Cond: (id = ro.courier_id)"
    "                                                  ->  Hash  (cost=2.63..2.63 rows=1 width=10) (actual time=0.039..0.039 rows=1 loops=2)"
    "                                                        Buckets: 1024  Batches: 1  Memory Usage: 9kB"
    "                                                        ->  Index Scan using courier_pkey on courier c1  (cost=0.41..2.63 rows=1 width=10) (actual time=0.034..0.034 rows=1 loops=2)"
    "                                                              Index Cond: (id = '7b35cdab-b423-472a-bde1-d6699f6cefd3'::uuid)"
    "                                            ->  Index Only Scan using restaurant_pkey on restaurant r  (cost=0.27..0.89 rows=1 width=4) (never executed)"
    "                                                  Index Cond: (id = ro.restaurant_id)"
    "                                                  Heap Fetches: 0"
    "                                      ->  Function Scan on jsonb_array_elements jae  (cost=0.00..1.00 rows=100 width=32) (never executed)"
    "                                ->  Function Scan on jsonb_array_elements ji  (cost=0.01..1.00 rows=100 width=32) (never executed)"
    "                          ->  Bitmap Heap Scan on catalogue c  (cost=4.63..6.87 rows=2 width=4) (never executed)"
    "                                Recheck Cond: ((id = ((jae.value -> 'id'::text))::integer) OR (id = ((ji.value -> 'id'::text))::integer))"
    "                                ->  BitmapOr  (cost=4.63..4.63 rows=2 width=0) (never executed)"
    "                                      ->  Bitmap Index Scan on catalogue_pkey  (cost=0.00..0.97 rows=1 width=0) (never executed)"
    "                                            Index Cond: (id = ((jae.value -> 'id'::text))::integer)"
    "                                      ->  Bitmap Index Scan on catalogue_pkey  (cost=0.00..0.97 rows=1 width=0) (never executed)"
    "                                            Index Cond: (id = ((ji.value -> 'id'::text))::integer)"
    "Planning Time: 1.113 ms"
    "Execution Time: 45.588 ms"
    


  • It builds a bitmap of 51,540 items. This is then split (roughly) in half, one for each of two parallel process. The report of exact=28755 is apparently for only one of those processes. (If you disable parallel queries by set max_parallel_workers_per_gather TO 0, the resulting plan will be easier to understand. This is usually the first thing I do when looking into a query performance, unless parallelization is the very thing I'm trying to study. Whatever improvements I make will then usually translate back to parallel execution.)

    A bitmap is inherently in physical order. Sorting it is not a separate step from just creating it. PostgreSQL reads the blocks in that order, one by one. It is up to the operating system/file system to coalesce those individual reads into sequential reads if it decides to. In my experience, you have to be reading nearly every block before this kicks in to good effect. If you only read every fifth (stochastically) block you might as well be doing random reads. I can't tell from your data what fraction of the table 28755 blocks represents.

    Now, why a bitmap created during Bitmap index scan phase has so many keys? There is ro__status index that can indicate that there are only about 200 records with status 70 and 73

    Indexes in PostgreSQL do not inherently contain any visibility info. "ro__status" can't know which of those ctid are still visible, so all of them have to be stuffed into the bitmap. Then most of them are rejected at the heap scan stage as being invisible. (This is not explicitly reported, the way recheck and filter rejections are. You have to infer it by the difference between the bitmap size and the final row count. In the case of BitmapAnd and BitmapOr, you can't easily even do that as the bitmap size is not accurate).

    So that is the crux of the problem, you are visiting over 50,000 tuples just to find 195 live ones. Cleaning those dead tuples out of the index is one of the main jobs of vacuuming. So it is likely your table is not getting vacuumed enough. You can test this pretty simply, vacuum the table and see if that fixes the problem. If it does not, then you probably have long-held snapshots which are preventing vacuum from being effective, so go hunt those down.

    Btree indexes do have a "microvacuum" feature where a regular index scan kills an index tuple which it has discovered points to dead-to-all heap tuple. But bitmap index scans do not implement this, as they do not revisit the index after the initial consultation, so there is no good opportunity to kill the index tuple. The bitmap scans will benefit from this microvacuum, but will not perform the microvacuum themselves. This can lead to a perverse situation where the more bitmap scans are preferred over regular index scans, the more bloated the relevant part of the index becomes until the bitmap scans start performing worse. Increased vacuuming could fix this, but if you don't want to increase it any further then you could just discourage bitmaps generally. Increasing effective_cache_size is can be one way to do that (but only if you actually have the RAM to back up that increase).




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2