High Availability with Patroni & etcd

Complete guide for setting up a PostgreSQL high-availability cluster using Patroni for automatic failover and etcd for distributed configuration storage.

Overview

This guide covers:

Patroni: PostgreSQL cluster management tool with automatic failover
etcd: Distributed key-value store for cluster state
HAProxy: Load balancing across PostgreSQL nodes
Watchdog: Prevention of split-brain scenarios

Architecture

                    ┌─────────────┐
                    │   HAProxy   │
                    │ 192.168.1.50│
                    │  Port: 5000 │
                    └──────┬──────┘
                           │
        ┌──────────────────┼──────────────────┐
        │                  │                  │
   ┌────▼────┐        ┌────▼────┐       ┌────▼────┐
   │ Node 1  │        │ Node 2  │       │ Node 3  │
   │Patroni  │◄──────►│Patroni  │◄─────►│Patroni  │
   │PostgreSQL│       │PostgreSQL│      │PostgreSQL│
   │  etcd   │        │  etcd   │       │  etcd   │
   └────┬────┘        └────┬────┘       └────┬────┘
        │                  │                  │
        └──────────────────┴──────────────────┘
              etcd consensus cluster

Prerequisites

All three nodes should have:

Ubuntu/Debian-based Linux
Network connectivity between nodes
Root or sudo access
Same PostgreSQL version

Part 1: Install etcd on All Nodes

etcd provides the distributed consensus needed for cluster coordination.

Install etcd

On all three nodes:

# Download etcd
ETCD_VER=v3.5.9
GITHUB_URL=https://github.com/etcd-io/etcd/releases/download
DOWNLOAD_URL=${GITHUB_URL}/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz

# Download and extract
curl -L ${DOWNLOAD_URL} -o /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
tar xzvf /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz -C /tmp

# Install binaries
sudo cp /tmp/etcd-${ETCD_VER}-linux-amd64/etcd* /usr/local/bin/
sudo chmod +x /usr/local/bin/etcd*

# Verify installation
etcd --version

Configure etcd

Create etcd configuration directory:

sudo mkdir -p /etc/etcd
sudo mkdir -p /var/lib/etcd

Create etcd systemd service:

sudo nano /etc/systemd/system/etcd.service

On Node 1 (192.168.1.51):

[Unit]
Description=etcd key-value store
Documentation=https://github.com/etcd-io/etcd
After=network.target

[Service]
Type=notify
User=etcd
Environment=ETCD_DATA_DIR=/var/lib/etcd
Environment=ETCD_NAME=node1
Environment=ETCD_INITIAL_ADVERTISE_PEER_URLS=http://192.168.1.51:2380
Environment=ETCD_LISTEN_PEER_URLS=http://192.168.1.51:2380
Environment=ETCD_LISTEN_CLIENT_URLS=http://192.168.1.51:2379,http://127.0.0.1:2379
Environment=ETCD_ADVERTISE_CLIENT_URLS=http://192.168.1.51:2379
Environment=ETCD_INITIAL_CLUSTER_TOKEN=patroni-cluster
Environment=ETCD_INITIAL_CLUSTER=node1=http://192.168.1.51:2380,node2=http://192.168.1.52:2380,node3=http://192.168.1.53:2380
Environment=ETCD_INITIAL_CLUSTER_STATE=new
ExecStart=/usr/local/bin/etcd
Restart=always
RestartSec=10s
LimitNOFILE=40000

[Install]
WantedBy=multi-user.target

On Node 2 (192.168.1.52):

[Unit]
Description=etcd key-value store
Documentation=https://github.com/etcd-io/etcd
After=network.target

[Service]
Type=notify
User=etcd
Environment=ETCD_DATA_DIR=/var/lib/etcd
Environment=ETCD_NAME=node2
Environment=ETCD_INITIAL_ADVERTISE_PEER_URLS=http://192.168.1.52:2380
Environment=ETCD_LISTEN_PEER_URLS=http://192.168.1.52:2380
Environment=ETCD_LISTEN_CLIENT_URLS=http://192.168.1.52:2379,http://127.0.0.1:2379
Environment=ETCD_ADVERTISE_CLIENT_URLS=http://192.168.1.52:2379
Environment=ETCD_INITIAL_CLUSTER_TOKEN=patroni-cluster
Environment=ETCD_INITIAL_CLUSTER=node1=http://192.168.1.51:2380,node2=http://192.168.1.52:2380,node3=http://192.168.1.53:2380
Environment=ETCD_INITIAL_CLUSTER_STATE=new
ExecStart=/usr/local/bin/etcd
Restart=always
RestartSec=10s
LimitNOFILE=40000

[Install]
WantedBy=multi-user.target

On Node 3 (192.168.1.53):

[Unit]
Description=etcd key-value store
Documentation=https://github.com/etcd-io/etcd
After=network.target

[Service]
Type=notify
User=etcd
Environment=ETCD_DATA_DIR=/var/lib/etcd
Environment=ETCD_NAME=node3
Environment=ETCD_INITIAL_ADVERTISE_PEER_URLS=http://192.168.1.53:2380
Environment=ETCD_LISTEN_PEER_URLS=http://192.168.1.53:2380
Environment=ETCD_LISTEN_CLIENT_URLS=http://192.168.1.53:2379,http://127.0.0.1:2379
Environment=ETCD_ADVERTISE_CLIENT_URLS=http://192.168.1.53:2379
Environment=ETCD_INITIAL_CLUSTER_TOKEN=patroni-cluster
Environment=ETCD_INITIAL_CLUSTER=node1=http://192.168.1.51:2380,node2=http://192.168.1.52:2380,node3=http://192.168.1.53:2380
Environment=ETCD_INITIAL_CLUSTER_STATE=new
ExecStart=/usr/local/bin/etcd
Restart=always
RestartSec=10s
LimitNOFILE=40000

[Install]
WantedBy=multi-user.target

Create etcd User

On all nodes:

sudo useradd -r -s /bin/false etcd
sudo chown -R etcd:etcd /var/lib/etcd

Start etcd

On all nodes:

sudo systemctl daemon-reload
sudo systemctl enable etcd
sudo systemctl start etcd
sudo systemctl status etcd

Verify etcd Cluster

On any node:

etcdctl member list
etcdctl endpoint health

Expected output:

0.0.1:2379 is healthy: successfully committed proposal: took = 1.234ms
168.1.51:2379 is healthy: successfully committed proposal: took = 1.567ms
168.1.52:2379 is healthy: successfully committed proposal: took = 1.890ms

Part 2: Install PostgreSQL on All Nodes

Install PostgreSQL on all three nodes:

# Add PostgreSQL repository
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'

# Import repository key
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -

# Update and install
sudo apt-get update
sudo apt-get install -y postgresql-15 postgresql-contrib-15

# Stop PostgreSQL (Patroni will manage it)
sudo systemctl stop postgresql
sudo systemctl disable postgresql

Part 3: Install Patroni on All Nodes

Patroni manages the PostgreSQL cluster and handles automatic failover.

Install Dependencies

On all nodes:

sudo apt-get install -y python3-pip python3-dev libpq-dev
sudo pip3 install patroni[etcd] psycopg2-binary

Configure Patroni

Create Patroni configuration directory:

sudo mkdir -p /etc/patroni

On Node 1 (192.168.1.51):

sudo nano /etc/patroni/patroni.yml

scope: postgres-cluster
namespace: /db/
name: node1

restapi:
  listen: 192.168.1.51:8008
  connect_address: 192.168.1.51:8008

etcd:
  hosts: 192.168.1.51:2379,192.168.1.52:2379,192.168.1.53:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    postgresql:
      use_pg_rewind: true
      parameters:
        wal_level: replica
        hot_standby: "on"
        max_connections: 100
        max_wal_senders: 10
        wal_keep_size: 256MB
        max_prepared_transactions: 0
        max_locks_per_transaction: 64
        track_commit_timestamp: "off"

  initdb:
  - encoding: UTF8
  - data-checksums

  pg_hba:
  - host replication replicator 127.0.0.1/32 scram-sha-256
  - host replication replicator 192.168.1.0/24 scram-sha-256
  - host all all 0.0.0.0/0 scram-sha-256

  users:
    admin:
      password: admin_password
      options:
        - createrole
        - createdb
    replicator:
      password: replicator_password
      options:
        - replication

postgresql:
  listen: 192.168.1.51:5432
  connect_address: 192.168.1.51:5432
  data_dir: /var/lib/postgresql/15/main
  bin_dir: /usr/lib/postgresql/15/bin
  pgpass: /tmp/pgpass
  authentication:
    replication:
      username: replicator
      password: replicator_password
    superuser:
      username: postgres
      password: postgres_password
  parameters:
    unix_socket_directories: '/var/run/postgresql'

tags:
    nofailover: false
    noloadbalance: false
    clonefrom: false
    nosync: false

On Node 2 (192.168.1.52):

Create /etc/patroni/patroni.yml with the same content, but change:

name: node2
restapi.listen: 192.168.1.52:8008
restapi.connect_address: 192.168.1.52:8008
postgresql.listen: 192.168.1.52:5432
postgresql.connect_address: 192.168.1.52:5432

On Node 3 (192.168.1.53):

Create /etc/patroni/patroni.yml with the same content, but change:

name: node3
restapi.listen: 192.168.1.53:8008
restapi.connect_address: 192.168.1.53:8008
postgresql.listen: 192.168.1.53:5432
postgresql.connect_address: 192.168.1.53:5432

Important

Make sure to change the passwords in production! The passwords shown here are examples only.

Create Patroni Systemd Service

On all nodes:

sudo nano /etc/systemd/system/patroni.service

[Unit]
Description=Patroni PostgreSQL High-Availability Manager
After=syslog.target network.target etcd.service

[Service]
Type=simple
User=postgres
Group=postgres
ExecStart=/usr/local/bin/patroni /etc/patroni/patroni.yml
ExecReload=/bin/kill -s HUP $MAINPID
KillMode=process
TimeoutSec=30
Restart=always

[Install]
WantedBy=multi-user.target

Set Permissions

On all nodes:

sudo chown postgres:postgres /etc/patroni/patroni.yml
sudo chmod 600 /etc/patroni/patroni.yml

Start Patroni

Important: Start Patroni one node at a time, waiting for each to initialize:

# On Node 1 first
sudo systemctl daemon-reload
sudo systemctl enable patroni
sudo systemctl start patroni
sudo systemctl status patroni

# Wait 30 seconds, then on Node 2
sudo systemctl daemon-reload
sudo systemctl enable patroni
sudo systemctl start patroni

# Wait 30 seconds, then on Node 3
sudo systemctl daemon-reload
sudo systemctl enable patroni
sudo systemctl start patroni

Verify Cluster Status

On any node:

patronictl -c /etc/patroni/patroni.yml list

Expected output:

+ Cluster: postgres-cluster ----+----+-----------+
| Member | Host          | Role  | State   | TL | Lag in MB |
+--------+---------------+-------+---------+----+-----------+
| node1  | 192.168.1.51 | Leader| running |  1 |           |
| node2  | 192.168.1.52 | Replica| running |  1 |         0 |
| node3  | 192.168.1.53 | Replica| running |  1 |         0 |
+--------+---------------+-------+---------+----+-----------+

Part 4: HAProxy Setup

HAProxy distributes client connections across the PostgreSQL cluster.

Install HAProxy

On a separate server (or on one of the nodes):

sudo apt-get install -y haproxy

Configure HAProxy

sudo nano /etc/haproxy/haproxy.cfg

global
    maxconn 100
    log /dev/log local0
    log /dev/log local1 notice
    chroot /var/lib/haproxy
    stats socket /run/haproxy/admin.sock mode 660 level admin
    stats timeout 30s
    user haproxy
    group haproxy
    daemon

defaults
    log     global
    mode    tcp
    retries 2
    timeout client 30m
    timeout connect 4s
    timeout server 30m
    timeout check 5s

listen stats
    mode http
    bind *:7000
    stats enable
    stats uri /

listen postgres_write
    bind *:5000
    option httpchk GET /
    http-check expect status 200
    default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
    server node1 192.168.1.51:5432 maxconn 100 check port 8008
    server node2 192.168.1.52:5432 maxconn 100 check port 8008
    server node3 192.168.1.53:5432 maxconn 100 check port 8008

listen postgres_read
    bind *:5001
    option httpchk GET /replica
    http-check expect status 200
    default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
    server node1 192.168.1.51:5432 maxconn 100 check port 8008
    server node2 192.168.1.52:5432 maxconn 100 check port 8008
    server node3 192.168.1.53:5432 maxconn 100 check port 8008

Start HAProxy

sudo systemctl enable haproxy
sudo systemctl restart haproxy
sudo systemctl status haproxy

Test HAProxy

View stats dashboard:

http://192.168.1.50:7000

Connect to primary (write):

psql -h 192.168.1.50 -p 5000 -U admin postgres

Connect to replicas (read):

psql -h 192.168.1.50 -p 5001 -U admin postgres

Part 5: Testing Failover

Manual Failover

Promote a replica to leader:

# On any node
patronictl -c /etc/patroni/patroni.yml switchover

You'll be prompted to confirm the switchover.

Simulate Node Failure

Stop Patroni on the current leader:

# On the leader node
sudo systemctl stop patroni

Patroni will automatically promote a replica within 30 seconds (TTL setting).

Check cluster status:

patronictl -c /etc/patroni/patroni.yml list

The former replica should now be the leader.

Restore Failed Node

Start Patroni again:

sudo systemctl start patroni

The node will automatically rejoin as a replica.

Part 6: Monitoring

Check Cluster Status

# List cluster members
patronictl -c /etc/patroni/patroni.yml list

# Show detailed history
patronictl -c /etc/patroni/patroni.yml history

# Check configuration
patronictl -c /etc/patroni/patroni.yml show-config

Check etcd Cluster

# Member list
etcdctl member list

# Cluster health
etcdctl endpoint health --cluster

# Get cluster configuration
etcdctl get --prefix /db/

Monitor HAProxy

View HAProxy stats:

http://192.168.1.50:7000

Check HAProxy logs:

sudo tail -f /var/log/haproxy.log

PostgreSQL Monitoring

On any node:

# Check replication status
psql -U postgres -c "SELECT * FROM pg_stat_replication;"

# Check if node is in recovery
psql -U postgres -c "SELECT pg_is_in_recovery();"

# Check replication lag
psql -U postgres -c "SELECT client_addr, state, sync_state, 
  pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS lag_bytes 
  FROM pg_stat_replication;"

Automation with autobase

For automated deployment of PostgreSQL HA clusters, consider using the autobase project by Vitaly Kukharik.

About autobase

Repository: https://github.com/vitabaks/autobase

autobase is an Ansible-based automation tool for deploying PostgreSQL high-availability clusters with:

Patroni for automatic failover
etcd/Consul/ZooKeeper for DCS
HAProxy or pgBouncer for connection pooling
Optional monitoring with Prometheus/Grafana

Features

One-command cluster deployment
Support for cloud providers (AWS, GCP, Azure, DigitalOcean)
Automatic SSL certificate generation
Backup configuration (pgBackRest, WAL-G)
Monitoring stack setup
Rolling updates

Quick Start with autobase

# Clone repository
git clone https://github.com/vitabaks/autobase.git
cd autobase

# Install requirements
pip3 install -r requirements.txt

# Edit inventory
nano inventory

# Configure variables
nano vars/main.yml

# Deploy cluster
ansible-playbook deploy_pgcluster.yml

Troubleshooting

Patroni Won't Start

Check logs:

sudo journalctl -u patroni -f

Common issues:

PostgreSQL data directory already exists (remove it for fresh install)
Wrong permissions on /etc/patroni/patroni.yml
etcd not accessible
Port conflicts (5432, 8008)

etcd Connection Failed

Check etcd is running:

sudo systemctl status etcd
etcdctl endpoint health

Test connectivity:

curl http://192.168.1.51:2379/health

Split-Brain Scenario

If cluster shows multiple leaders:

Stop Patroni on all nodes:

sudo systemctl stop patroni

Clear etcd cluster data:

etcdctl del --prefix /db/

Start Patroni one node at a time:

# Node 1
sudo systemctl start patroni
# Wait 30s
# Node 2
sudo systemctl start patroni
# Wait 30s
# Node 3
sudo systemctl start patroni

HAProxy Shows All Nodes Down

Check Patroni REST API is accessible:

curl http://192.168.1.51:8008/

Should return HTTP 200 for leader, 503 for replica.

Verify HAProxy can reach nodes:

sudo tcpdump -i any port 8008

Best Practices

Use Odd Number of Nodes: 3 or 5 nodes for quorum
Monitor etcd Health: Critical for cluster consensus
Set Appropriate TTL: Balance between fast failover and stability
Use Synchronous Replication: For zero data loss
Regular Backup Testing: Patroni is not a backup solution
Monitor Replication Lag: Alert on excessive lag
Document Failover Procedures: Train your team
Test Failover Regularly: In development/staging
Use Watchdog: Prevent split-brain scenarios (advanced)
Secure Communications: Use SSL/TLS in production

Security Enhancements

Enable SSL for PostgreSQL

Generate certificates:

# On all nodes
sudo -u postgres openssl req -new -x509 -days 365 -nodes \
  -text -out /var/lib/postgresql/15/main/server.crt \
  -keyout /var/lib/postgresql/15/main/server.key \
  -subj "/CN=postgres"

sudo chmod 600 /var/lib/postgresql/15/main/server.key

Update patroni.yml:

postgresql:
  parameters:
    ssl: on
    ssl_cert_file: '/var/lib/postgresql/15/main/server.crt'
    ssl_key_file: '/var/lib/postgresql/15/main/server.key'

Secure etcd with TLS

Generate certificates for etcd:

# Create CA and certificates
# See etcd security documentation

Update etcd service to use TLS.

Restrict Network Access

Use firewall rules:

# Allow only cluster nodes
sudo ufw allow from 192.168.1.0/24 to any port 2379  # etcd client
sudo ufw allow from 192.168.1.0/24 to any port 2380  # etcd peer
sudo ufw allow from 192.168.1.0/24 to any port 5432  # PostgreSQL
sudo ufw allow from 192.168.1.0/24 to any port 8008  # Patroni REST API

Overview​

Architecture​

Prerequisites​

Part 1: Install etcd on All Nodes​

Install etcd​

Configure etcd​

Create etcd User​

Start etcd​

Verify etcd Cluster​

Part 2: Install PostgreSQL on All Nodes​

Part 3: Install Patroni on All Nodes​

Install Dependencies​

Configure Patroni​

Create Patroni Systemd Service​

Set Permissions​

Start Patroni​

Verify Cluster Status​

Part 4: HAProxy Setup​

Install HAProxy​

Configure HAProxy​

Start HAProxy​

Test HAProxy​

Part 5: Testing Failover​

Manual Failover​

Simulate Node Failure​

Restore Failed Node​

Part 6: Monitoring​

Check Cluster Status​

Check etcd Cluster​

Monitor HAProxy​

PostgreSQL Monitoring​

Automation with autobase​

About autobase​

Features​

Quick Start with autobase​

Troubleshooting​

Patroni Won't Start​

etcd Connection Failed​

Split-Brain Scenario​

HAProxy Shows All Nodes Down​

Best Practices​

Security Enhancements​

Enable SSL for PostgreSQL​

Secure etcd with TLS​

Restrict Network Access​

Additional Resources​

Overview

Architecture

Prerequisites

Part 1: Install etcd on All Nodes

Install etcd

Configure etcd

Create etcd User

Start etcd

Verify etcd Cluster

Part 2: Install PostgreSQL on All Nodes

Part 3: Install Patroni on All Nodes

Install Dependencies

Configure Patroni

Create Patroni Systemd Service

Set Permissions

Start Patroni

Verify Cluster Status

Part 4: HAProxy Setup

Install HAProxy

Configure HAProxy

Start HAProxy

Test HAProxy

Part 5: Testing Failover

Manual Failover

Simulate Node Failure

Restore Failed Node

Part 6: Monitoring

Check Cluster Status

Check etcd Cluster

Monitor HAProxy

PostgreSQL Monitoring

Automation with autobase

About autobase

Features

Quick Start with autobase

Troubleshooting

Patroni Won't Start

etcd Connection Failed

Split-Brain Scenario

HAProxy Shows All Nodes Down

Best Practices

Security Enhancements

Enable SSL for PostgreSQL

Secure etcd with TLS

Restrict Network Access

Additional Resources