improve query performance in postgres


  • QA Engineer

    i have a table addressfabric_v2_2021_12

    CREATE TABLE IF NOT EXISTS public.addressfabric_v2_2021_12(
    country character varying(10) COLLATE pg_catalog."default",
    areaname1 character varying(10) COLLATE pg_catalog."default",
    areaname2 character varying(100) COLLATE pg_catalog."default",
    areaname3 character varying(100) COLLATE pg_catalog."default",
    areaname4 character varying(100) COLLATE pg_catalog."default",
    postcode character varying(20) COLLATE pg_catalog."default",
    pbkey character varying(50) COLLATE pg_catalog."default",
    prop_type character varying(10) COLLATE pg_catalog."default",
    postcode_ext character varying(10) COLLATE pg_catalog."default",
    add_number character varying(20) COLLATE pg_catalog."default",
    streetname character varying(60) COLLATE pg_catalog."default",
    unit_type character varying(20) COLLATE pg_catalog."default",
    unit_num character varying(50) COLLATE pg_catalog."default",
    type character varying(3) COLLATE pg_catalog."default",
    lat character varying(20) COLLATE pg_catalog."default",
    lon character varying(20) COLLATE pg_catalog."default",
    parent character varying(50) COLLATE pg_catalog."default",
    geoid character varying(50) COLLATE pg_catalog."default",
    geometry geometry,
    CONSTRAINT enforce_addfb_srid_geometry_2021_12 CHECK (st_srid(geometry) = 4326))
    

    I am having performance issue in the below query

     select count(*) from addressfabric_v2_2021_12 as addr, (
    SELECT ST_SubDivide('POLYGON((-74.4234467 40.7541435,-74.3904877 40.7489001,-74.380188 40.7587776,-74.3891144 40.7748904,-74.4179535 40.7790479,-74.4316864 40.7665746,-74.4241333 40.7566983,-74.4234467 40.7541435))'::geometry,20)) as
    f(geom) where st_intersects(ST_SetSRID(f.geom,4326),addr.geometry);
    

    After this i created the following index

      CREATE INDEX IF NOT EXISTS addressfabric_v2_2021_12_geometry
    ON public.addressfabric_v2_2021_12 USING gist
    (geometry)  TABLESPACE pg_default;
    

    The performance was increased but still not satisfactory

    Aggregate  (cost=591866320.40..591866320.41 rows=1 width=8) (actual 
            time=11.812..11.813 rows=1 loops=1)->Nested Loop  (cost=0.55..591346132.52 rows=208075152 width=0) (actual 
                   time=0.219..11.476 rows=7072 loops=1)
        ->ProjectSet  (cost=0.00..30.02 rows=1000 width=32) (actual 
                         time=0.010..0.012 rows=1 loops=1)
              ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual 
                           time=0.001..0.001 rows=1 loops=1)
        ->  Index Scan using addressfabric_v2_2021_12_geomtry on 
            addressfabric_v2_2021_12 addr  (cost=0.55..591138.01 rows=20808 
            width=32) (actual time=0.207..10.893 rows=7072 loops=1)
              Index Cond: (geometry && st_setsrid((st_subdivide('01030000000100000008000000580630C0199B52C0D87E32C687604440603F1FC0FD9852C04EBC5EF5DB5F44403A950C00559852C0FC77D89F1F614440C5991540E79852C0FFFDCE9B2F634440EB6F09C0BF9A52C03D4272D7B76344403188FEBFA09B52C0DE78D21D1F6244404029FFFF249B52C02AB8697DDB604440580630C0199B52C0D87E32C687604440'::geometry, 20, '-1'::double precision)), 4326))
              Filter: st_intersects(st_setsrid((st_subdivide('01030000000100000008000000580630C0199B52C0D87E32C687604440603F1FC0FD9852C04EBC5EF5DB5F44403A950C00559852C0FC77D89F1F614440C5991540E79852C0FFFDCE9B2F634440EB6F09C0BF9A52C03D4272D7B76344403188FEBFA09B52C0DE78D21D1F6244404029FFFF249B52C02AB8697DDB604440580630C0199B52C0D87E32C687604440'::geometry, 20, '-1'::double precision)), 4326), geometry)
              Rows Removed by Filter: 1987
    Planning Time: 0.685 ms
    Execution Time: 11.900 ms 
    

    Any help will be appreciated.

    EDIT

    on rewriting the query using join , queryplan is including a function scan also

      select count(*) from addressfabric_v2_2022_1 as addr join ST_SubDivide('POLYGON((-74.4234467 40.7541435,-74.3904877 40.7489001,-74.380188 40.7587776,-74.3891144 40.7748904,-74.4179535 40.7790479,-74.4316864 40.7665746,-74.4241333 40.7566983,-74.4234467 40.7541435))'::geometry,20)as f(geom) on st_intersects(ST_SetSRID(f.geom,4326),addr.geometry);
    

    QUERY PLAN

    Aggregate  (cost=543972801.24..543972801.25 rows=1 width=8) (actual time=11.208..11.209 rows=1 loops=1)->Nested 
     Loop(cost=25.55..543493555.50 rows=191698296 width=0) 
    (actual time=0.114..10.856 rows=7073 loops=1)
    ->Function Scan on st_subdivide f  (cost=25.00..35.00 rows=1000 
     width=32) (actual time=0.010..0.011 rows=1 loops=1)
    ->  Index Scan using addressfabric_v2_2022_1_geometry on 
    addressfabric_v2_2022_1 addr(cost=0.55..543301.82 rows=19170 
    width=32)   (actual time=0.101..10.220 rows=7073 loops=1)
    Index Cond: (geometry && st_setsrid(f.geom, 4326))
    Filter: st_intersects(st_setsrid(f.geom, 4326), geometry)
    Rows Removed by Filter: 1987
    Planning Time: 0.092 ms
    Execution Time: 11.240 ms
    

    Is there any way to create index instead of function scan in new query plan?

    update query plan after parallel setting and large dataset

    Aggregate  (cost=10539438036.82..10539438036.83 rows=1 width=8) (actual time=740.148..740.149 rows=1 loops=1)
      ->  Nested Loop  (cost=10000000025.55..10538917273.50 rows=208305328 width=0) (actual time=0.173..717.950 rows=429963 loops=1)
            ->  Function Scan on st_subdivide f  (cost=25.00..35.00 rows=1000 width=32) (actual time=0.008..0.010 rows=1 loops=1)
            ->  Index Scan using addressfabric_v2_2022_1_geometry on addressfabric_v2_2022_1 addr  (cost=0.55..538708.93 rows=20831 width=32) (actual time=0.162..679.372 rows=429963 loops=1)
                  Index Cond: (geometry && st_setsrid(f.geom, 4326))
                  Filter: st_intersects(st_setsrid(f.geom, 4326), geometry)
                  Rows Removed by Filter: 32726
    Planning Time: 0.108 ms
    Execution Time: 740.184 ms
    


  • The only way that may get you below half a second, seems to be to throw more CPU at it. Either you get a faster system, or you use more of what you've got, ie. parallelism. With PostgreSQL 13 you should be safe.

    Before your query, execute:

    SET max_parallel_workers = 8;
    SET max_parallel_workers_per_gather = 4;
    

    See: http://blog.cleverelephant.ca/2019/05/parallel-postgis-4.html




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2