Warning: is_dir(): open_basedir restriction in effect. File(/tmp) is not within the allowed path(s): (/data/web/virtuals/98151/virtual) in /data/web/virtuals/98151/virtual/www/subdom/postgresql/wp-content/plugins/wp-simple-firewall/src/lib/vendor/fernleafsystems/wordpress-services/src/Core/Fs.php on line 465

Warning: Cannot modify header information - headers already sent by (output started at /data/web/virtuals/98151/virtual/www/subdom/postgresql/wp-content/plugins/wp-simple-firewall/src/lib/vendor/fernleafsystems/wordpress-services/src/Core/Fs.php:465) in /data/web/virtuals/98151/virtual/www/subdom/postgresql/wp-content/plugins/disable-xml-rpc-pingback/disable-xml-rpc-pingback.php on line 51
Problems with autovacuum during high load on server - PostgreSQL Candies

I last days we had big spikes in number of inserts on some of our databases causing rapid increase in “database age” (i.e. growth of transaction ID) and some of our databases have reached alerting level of transaction ID value set to 1 billion . See in the left part of this picture

After huge problems we had with transaction ID wraparound at the beginning of 2019 we monitor age of database using Telegraf + InfluxDB + Grafana

Configuration for telegraf:

[[inputs.postgresql_extensible.query]]
     sqlquery="SELECT datname as dbname, age(datfrozenxid) as dbage FROM pg_database ORDER BY 2 DESC;"
     withdbname=false
     tagvalue=""
     measurement="postgresql_dbage"

On databases in question autovacuum was suddenly stacked on processing rapidly growing daily partitions and was unable to re-claim transaction IDs back.

Solution was set using this article – Tuning Postgres Autovacuum for Scale – we already used bigger memory setting for autovacuum_work_mem but kept autovacuum_vacuum_cost_limit on default value 200. So now we tested increased values for this parameter and value 2000 currently looks like god setting. As you can see in the right part of the graph autovacuum is now able to keep max transaction ID around 200 000 which is default setting for trigger of autovacuum.

We do not see any significant changes in CPU/ memory usage, disk IO, context switches etc.