Production PostgreSQL Space Issue, Huge pg_largeobject, Newbie Thrown Off the Deep End!



  • Server: PG 9.4, Linux, poorly/defaultly setup, no care and feeding

    Databases: 6 user databases, all on pg_default

    CT database: ~435GB pg_largeobject in that DB table size: 427GB total size: 433GB index size: 5.7GB

    All other DBs < 10GB combined


    Problem: Tablespace filled up, apps crashed, chickens with no heads. Me, the SQL Server consultant with ~60K hours experience was told "see if you can sort this, we don't have anyone". I have zero experience with PostgreSQL.


    Initial Response: By the time I did minimal research on how to connect, got a tool to do so, etc., etc. and do initial looks for junk disk files I could delete to free up any space to get them back online, magically the apps started working again.

    Flopping and twitching and Binoogling around, I found that a VACUUM was run between broken and working. last_autovacuum column NULL for pg_stats_all_tables though.


    Current State: pg_largeobject now has has n_live_tup 143M, n_tup_del 55K, n_tup_upd 56K, n_dead_tup 724K (the last rising fairly quickly)

    I had them provision a new disk, and they gave me 800GB (filesystem: /dev/stb, mounted on: /data). I hope I described that sufficiently

    As of now, I have complete access to the database server, no other activity (so I have been told).


    Potential Plan A:

    a) Create a new tablespace comprised of the 800GB disk space

    b) Execute a VACUUM FULL on CT.pg_largeobject while concurrently placing the newly written copy of the now spiffy-clean, reduced-size table from the VACUUM on the new tablespace. According to docs I have enough space for this. I do not have enough space for a 're_pack' I found referenced, which requires ~2X space.

    Sadly I cannot see any way in the docs to specify table-copy location on VACUUM FULL.

    Potential Plan B:

    ALTER TABLE pg_largeobject SET tablespace to new 800GB one

    Sadly this doesn't seem to have the ability to clean up the crap in the table during the move.

    Potential Plan 😄

    Just do a VACUUM level just below FULL (FREEZE?) and hope that frees enough space to call it a day.


    FEAR: for either option, I am afraid of blowing out the write ahead log. Current setting per config:

    wal_level = 'logical'


    FINALLY:

    It would seem that unless I can do a "compact-during-move-to-new-tablespace" evolution, that with the constraints I have been given this table can never be fully VACUUMed.

    Honestly, I don't really care. I want to get this resolved as simply and quickly as possible, and be able to tell the client that PG is running and should not crash again due to space for the foreseeable future. Preferably without totally FUBARing things in the process!

    Thoughts?? Which plan would get them away from running out of space again without causing a problem like filling up the log? Or do I need to do something I haven't come across yet?

    Help very much appreciated!


    UPDATE 20220216 1225ish

    a) I was looking for "junk/temp/unnecessary" files in all Linux directories on the server, not those that are required by any service/app such as PG

    b) I did not add a tablespace, that was a possible plan.

    c) I did not run any form of VACUUM yet, much less FULL.

    d) I had already tried to compile pgstattuple based on my research, it failed to compile, I did pursue further.

    e) I nor anyone at the company has experience with PG, so upgrading would be far too risky.

    f) no one has ability to change app at client.

    g) if autovacuum column in ..stats.. is all NULL, does that mean AV has not run?

    Part 2:

    1. this is VMWare VM and centOS. I like idea of expanding volume, I will inquire and hope it is viable!
    2. how would I move entire PG instance to new volume?
    3. app does need all data it contains
    4. VACUUM (and tablespace alter table) was hoped to clean and move at same time. Lots of known and increasing dead records. I wondered how many other rows may be in a "removable" state
    5. see below re pgstattuple. I may need to debug and get that working. 6) vacuumlo -n CT is run from pgsql command line, right?
    6. if add new tablespace, can I make a table start using it for NEW data, leaving existing as is? In SQL Server, if you add a new file to a filegroup, all objects on that filegroup immediately have access to the new file.

    UPDATE 20220216 1530ish

    I was told by the admin team that extending the volume in place is not an option - something about the volume not being at the end of the storage for the server. The age of the server/OS as well as some lack of Linux experience combined with a hosting arrangement where by the client does not have full access to the VMWare environment could be factors.


    UPDATE 20220217 15:15

    A) I was finally able to compile pgstattuple. I will run this against the CT.pg_largeobject table after 1730 today when all users are off the system.

    B) I tried to CREATE EXTENSION vacuumlo, but it fails with the message that "/usr/pgsql-9.4/share/extension/vacuumlo.control": no such file or directory. And while I can see other .sql and .control files in that directory, vacuumlo files ARE missing. I then went to the /usr/pgsql-9.4/share/contrib directory, where I thought I saw the code for these would exist, but there is only one file there, sepgsql.sql.

    I stumbled around and did find a vacuumdb, vacuumlo, pgbench, etc. in the /bin directory. But I did not seem to be able to execute or get any information on vacuumlo.

    Qustion: how can I compile/use vacuumlo?



  • Ask whoever is in charge of the storage (assuming they know what they are doing) if they can't just expand your existing volume by 800GB, rather than attaching the new space as a new partition/volume. some RAID systems or volume managers will allow such dynamic resizing. If that can be done, then problem solved.

    If they can't expand it, I don't like the multiple tablespace option. That adds yet more complexity to a system which is already more complex than you can handle. Rather than moving the dominant object to a new tablespace leaving a tiny rump behind, just move the whole PostgreSQL instance to the new larger disk, freeing up the old disk entirely. It will probably be faster. It will require the entire instance to be down for the move, but it is questionable how usable it would be during an online migration anyway, what with its dominant object being locked and the IO system being swamped.

    Ask the senior end users of this system if it makes sense that this database is so large. What is it storing that takes up so much space?

    Run vacuumlo -n CT. This will tell you how many of the objects are unreferenced from other (obvious) places in the database.

    You focus a lot on VACUUM FULL. That only makes sense if pg_largeobject is bloated. Maybe it is efficiently storing all the things it has been told to store--and that is just a lot of things. Before worrying about de-bloating it, find out if it is actually bloated. One way to do that is with pgstattuple. A faster way is with pg_freespace, but that depends on metadata that might be out of date, so you should make sure the table was recently VACUUMed first (but you should do that anyway).

    As for blowing out WAL, that is a concern if you do move the table to a new tablespace. But the setting of wal_level = 'logical' doesn't mean much--it just fails to rule out the concern. You need to figure out if archive_command is actually in use, or if replication slots actually exist, and evaluate their ability to keep up.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2