No speedup using parallel restore with pg_restore



  • I need a large database like 10TB-15TB in Postgresql for benchmarking.

    I've created a smaller ~100GB sample databases with pgbench like this:

    pgbench -i -s 7000 --no-vacuum exampledb
    

    However, I observe no speedup when doing parallel restore from custom format (created with pg_dump -F c ...). Including restoring from .sql dump as well:

    time psql -d pgbench7000 < pgbench7000.sql
    

    SET
    SET
    SET
    SET
    SET

    set_config

    (1 row)

    SET
    SET
    SET
    SET
    SET
    SET

    CREATE TABLE
    ALTER TABLE

    CREATE TABLE
    ALTER TABLE

    CREATE TABLE
    ALTER TABLE

    CREATE TABLE
    ALTER TABLE

    COPY 700000000
    COPY 7000
    COPY 0
    COPY 70000

    ALTER TABLE
    ALTER TABLE
    ALTER TABLE

    real 32m1.539s
    user 3m28.575s
    sys 1m40.125s

    ####################

    time pg_restore -d pgbench7000 pgbench7000.custom
    real 20m7.504s
    user 2m35.556s
    sys 0m35.750s

    ####################

    time pg_restore -j 16 -d pgbench7000 pgbench7000.custom

    real 20m56.565s
    user 2m57.547s
    sys 0m40.096s

    I've given Postgresql server generous resources:

    max_connections = 1000              
    shared_buffers = 94GB  # 25% of system's memory                
    work_mem = 512MB                      
    maintenance_work_mem = 12GB
    effective_io_concurrency = 500
    max_worker_processes = 50     
    max_parallel_maintenance_workers = 10   
    max_parallel_workers_per_gather = 10   
    max_parallel_workers = 50
    

    The box is really powerful, it has 374GB of RAM, 72 vCPUs and fast NAS storage mounted over NFS (that's where PG data directory resides).

    pgbench creates a rather simple database, there are like four tables and only one table is big and occupies most of space. Can this be the reason for lack of speedup?

    Can anybody tell where's the bottleneck? Or is it just db structure?

    Environment: Centos 7.9 (that's what I have to use), Postgres 11 installed from project repo.



  • Parallel restore does different operations in parallel, but doesn't parallelize individual operations. Since pgbench is dominated by one table, there isn't much that can be done in parallel for it at the client level.

    With some shell scripting, you can run pg_restore multiple times in parallel, pointing each one at a different database name. (But you could do the same scripting using pgbench -i -s 7000 $DBNAME & instead, skipping the dump and reload steps.)




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2