Index not picked up on nullable boolean column



  • I have 2 TimescaleDB Databases (PROD and DEV), both have a hypertable with the same schema and almost the same data (44M rows for the first one, 40M for the 2nd one).

    I have to query some data on a boolean column

    SELECT * FROM pumpcards
    WHERE to_label = TRUE
    ORDER BY starttime DESC;
    

    Because I only have couple cards (1xxx) with a non-null value I created an index:

    CREATE INDEX to_label_true ON pumpcards (to_label)
    WHERE to_label IS NOT NULL;
    

    Now my SELECT on the dev db takes 500ms but on prod it takes 25+sec and CPU goes to 100%

    In my explain analyse of the query on DEV I can see that the index is being picked up:

    Index Cond: (to_label = true)

    But PROD is not using the index and it is filtering on to_label:

    Filter: to_label Rows Removed by Filter: 2164238

    I deleted all the index, re-created them, I also ran VACUUM ANALYZE pumpcards. No luck.

    My table schema:

    CREATE TABLE IF NOT EXISTS public.pumpcards (
        id bigint NOT NULL DEFAULT nextval('pumpcards_id_seq'::regclass),
        wellid integer,
        starttime timestamp without time zone NOT NULL,
        endtime timestamp without time zone NOT NULL,
        physical_card boolean NOT NULL,
        nb_points integer NOT NULL,
        "position" text COLLATE pg_catalog."default" NOT NULL,
        load text COLLATE pg_catalog."default" NOT NULL,
        pressure text COLLATE pg_catalog."default",
        opti_gf double precision,
        ml_gf double precision,
        error_score double precision,
        novelty_score double precision,
        labeled boolean NOT NULL,
        manual_gf double precision,
        to_label boolean,
        erroneous_card boolean,
        fluid_pound boolean,
        gas_interference boolean,
        pump_tagging boolean,
        worn_top_valve boolean,
        worn_bottom_valve boolean,
        stuck_top_valve boolean,
        stuck_bottom_valve boolean,
        worn_barrel boolean,
        unanchored_tubing boolean,
        stuck_pump boolean,
        parted_rod boolean,
        solid_friction_along_rod boolean,
        solid_friction_in_pump boolean,
        tubing_leak boolean,
        tight_stuffing_box boolean,
        flumping boolean,
        depth_of_issue double precision,
        to_investigate boolean,
        position_downhole text COLLATE pg_catalog."default",
        load_downhole text COLLATE pg_catalog."default",
        work double precision NOT NULL,
        work_downhole double precision,
        gas_lock boolean,
        amplitude_dwl real,
        amplitude_sfc real,
        max_position real,
        min_position real,
        max_load real,
        min_load real,
        CONSTRAINT pumpcards_pkey PRIMARY KEY (id, starttime),
        CONSTRAINT pumpcards_wellid_starttime_key UNIQUE (wellid, starttime)
    )
    

    CREATE INDEX IF NOT EXISTS pumpcards_id_idx
    ON public.pumpcards USING btree
    (id DESC NULLS FIRST);

    CREATE INDEX IF NOT EXISTS pumpcards_labeled_erroneous_card_idx
    ON public.pumpcards USING btree
    (labeled DESC NULLS FIRST, erroneous_card ASC NULLS LAST)
    WHERE manual_gf IS NOT NULL;

    CREATE INDEX IF NOT EXISTS pumpcards_starttime_idx
    ON public.pumpcards USING btree
    (starttime DESC NULLS FIRST);

    CREATE INDEX to_label_true ON pumpcards (to_label)
    WHERE to_label IS NOT NULL;

    CREATE INDEX IF NOT EXISTS pumpcards_wellid_starttime_idx
    ON public.pumpcards USING btree
    (wellid ASC NULLS LAST, starttime DESC NULLS FIRST);

    CREATE TRIGGER ts_insert_blocker
    BEFORE INSERT
    ON public.pumpcards
    FOR EACH ROW
    EXECUTE FUNCTION _timescaledb_internal.insert_blocker();

    Example EXPLAIN ANALYZE in PROD:

    Parallel Seq Scan on _hyper_53_2637_chunk  (cost=0.00..98436.09 rows=1 width=400) (actual time=10555.252..10555.253 rows=0 loops=1)
                        Filter: to_label
                        Rows Removed by Filter: 2138437
    

    Any ideas?



  • Judging from what you report, the index should be used. There must be some kind of misunderstanding.

    It's unclear which index we are talking about, exactly. You mention this one at the top:

    CREATE INDEX to_label_true ON pumpcards (to_label)
    WHERE to_label IS NOT NULL;
    

    And show these two different indexes in the table description:

    CREATE INDEX IF NOT EXISTS pumpcards_to_label_starttime_idx
        ON public.pumpcards USING btree
        (to_label ASC NULLS LAST, starttime DESC NULLS FIRST)
        WHERE to_label IS NOT NULL;
    

    CREATE INDEX IF NOT EXISTS to_label_true
    ON public.pumpcards USING btree
    (to_label ASC NULLS LAST)
    WHERE to_label IS TRUE;

    All distinct, and neither is optimal. This would the perfect fit for your query:

    CREATE INDEX to_label_true_starttime_desc ON public.pumpcards (starttime DESC)
    WHERE to_label = true;
    



Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2