Is there a way to see which tables contribute most to WAL size?
I'm maintaining a postgresql database and am considering the use of clustering so that update queries have to touch fewer pages.
With clustering based on indices I'm seeing a significant speedup for the actual queries, but is there a way to profile WAL files to see which tables contribute most to the log size when they are modified? I'm experiencing a number of issues related to transaction sizes and operations on write ahead logs by other teams maintaining replicae of the database.
If rows which cluster together get updated together, that could result in fewer full-page-images, so it would be expected to decrease WAL size. But moving checkpoints further apart would also do that, and wouldn't generate the massive spike in WAL that the CLUSTER operation itself would do.
You can use pg_waldump to inspect some WAL files. It doesn't have a mode to generate per-object stats, so you would have to use something like Perl to parse and summarize. The records only indicate objects by their relfilenodes, so you would also need to do something to convert them back to names.
But you should probably tackle the problem more directly. What are the problems you are having? Raw size of WAL? Transmission over network? Random reads needed to fetch the pages which need to get updated during replay?