Press ESC to close

PostgreSQL Fast Health Check

  • Which postgresql rpm package installed
    • rpm -qa|grep postgres
  • Kernel Settings Check
    • cat /etc/sysctl.conf
  • Check Memory and HugePage Settings
    • free -g
    • grep PageTables /proc/meminfo
    • grep Huge /proc/meminfo
  • Check disk usage
    • df -h
  • Check linux server performance and stranger processes other than postgresql
    • top
  • check postgresql log for errors and warning
    • tail -1000f postgresql.log
  • Check user and roles
    • Application user must not have superuser priviliges
    • postgres=# \du
  • Check public schema and the other schemas privliges.
SELECT n.nspname AS "Name", pg_catalog.pg_get_userbyid(n.nspowner) AS "Owner", pg_catalog.array_to_string(n.nspacl, E'\n') AS "Access privileges",
 pg_catalog.obj_description(n.oid, 'pg_namespace') AS "Description"
 FROM pg_catalog.pg_namespace n WHERE n.nspname !~ '^pg_' AND n.nspname <> 'information_schema'
  • Check extensions
    • \dx
    • pg_stat_statements ⇒ for monitoring sql performance
    • pg_audit ⇒ fine grained auditing
    • pgpasswordcheck ⇒ for hardening password extensions
    • pgstattuple ⇒ analyzing table bloats
    • pgcrypto => Did you decide Mission or Security critical tables and do logging for these tables,
  • also encrypt sensitive data with pg_crypto ( with application vendor)
  • Check whether only allowed ip’s are in pg_hba.conf
    • check pg_hba.conf if it has true password authentications rules
    • cat pg_hba.conf
  • Check backups and did you do any restore recover test from backups ?
  • Check Replica Stream replica , cascade replica
    • select * from pg_stat_replication;
  • Check HA (High Availibility) BFM or other.
  • What is hit ratio of the database ?
SELECT (blks_hit*100/(blks_hit+blks_read))::numeric as hit_ratio FROM pg_stat_database WHERE datname='<datname>';

-- The hit ratio should be near to 100%, if it is less than 90% then their might me some issue with shared buffers or queries. Increase shared buffers or tune the queries doing more IO.
  • IDLE  or IDLE in Transaction Connections
    • Is there is job  Killing idle  or IDLE in Transaction connections after <xxx> minutes in crontab
    • crontab -l
  • IDLE Connections
    • Is there is job  Killing idle connections after <xxx> minutes in crontab
    • https://github.com/farukcevik/for_PostgreSQL_DBA/blob/main/kill_idle_sessions.sh
-- kill_idle_sessions.sql

SELECT pg_terminate_backend(pid)     
FROM pg_stat_activity    
 WHERE usename in ( 'user1'   ,'user2')   
 AND pid <> pg_backend_pid()   
 AND state = 'idle'      
 AND state_change < current_timestamp - INTERVAL '10' MINUTE;
  • Monitoring Tools, notification and alert mechanism
    • Is there a monitoring tools in production and alert mechanism such as if disk capacity is %90 full or cpu usage is more then %90 or active sessions in pg_stat_activitiy is bigger than 10 send an alert email to system and database admins.  And notification if backup has succesfully taken etc.
  • Frequency of checkpoints ?
    • select * from pg_stat_bgwriter;
    • If the checkpoints_req is more than checkpoints_timed then we need to think of increasing the max_wal_size. Also set the checkpoint_completion_target=0.9 to spread the checkpoint. (checkpoints_req > checkpoints_timed is bad).
  • By below query we can find the frequency of checkpoint, if it is less than “checkpoint_timeout” then reduce the frequency of checkpoint by increasing checkpoint_timeout, max_wal_size.
WITH sub as (SELECT EXTRACT(EPOCH FROM (now() - stats_reset)) AS seconds_since_start,
(checkpoints_timed+checkpoints_req) AS total_checkpoints
FROM pg_stat_bgwriter
)
SELECT total_checkpoints,seconds_since_start/total_checkpoints/60 AS minutes_between_checkpoints
FROM sub;

set log_checkpoints so that we can find how much data it is writing for checkpoint. else we can use this query to find the number of blocks written during the checkpoint.

  • Average number of buffers written per checkpoint ?
select buffers_checkpoint/(checkpoints_timed+checkpoints_req) from pg_stat_bgwriter;

This helps in tuning the max_wal_size/wal_keep_segments

  • Get top five tables with highest sequential scans ?
SELECT schemaname, relname, seq_scan, seq_tup_read, seq_tup_read / seq_scan as avg_seq_tup_read FROM pg_stat_all_tables WHERE seq_scan > 0 
And pg_total_relation_size(schemaname.relname) > 104857600
ORDER BY 5 DESC LIMIT 5;
  • Get list of unused indexes ?
    • select * from pg_stat_all_indexes where idx_scan=0;
  • Get average response time of an instance ?
    • select (sum(total_time)/sum(calls))::numeric(6,3) from pg_stat_statements;
  • PostgreSQL Best Practice postgresql.conf for 32GB Memories

listen_addresses = '*'
max_connections = 150
shared_buffer: 8GB # (must be maximum one in four of total RAM )
shared_preload_libraries = 'pg_stat_statements'
maintenance_work_mem=1GB
bgwriter_delay = 10ms
bgwriter_lru_maxpages = 1000
bgwriter_lru_multiplier = 4.0
wal_level = replica
wal_log_hints = on # (if checksum not enabled)
max_wal_size = 12GB
min_wal_size = 6GB
checkpoint_completion_target = 0.7
checkpoint_timeout =10
archive_mode = on
archive_command = 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'
wal_keep_segments = 1000
log_filename = 'postgresql-%a.log'
log_min_duration_statement = 1000
log_line_prefix = '%m [%p] app=%a user=%u db=%d host=%h pid=%p '
log_lock_waits = on
log_temp_files = 0
track_io_timing = on
autovacuum = on #(It is not recommended to turn off)
log_autovacuum_min_duration = 0
autovacuum_vacuum_scale_factor = 0.01
autovacuum_analyze_scale_factor = 0.01
effective_cache_size= 24GB

Bir yanıt yazın

E-posta adresiniz yayınlanmayacak. Gerekli alanlar * ile işaretlenmişlerdir