PostgreSQL Replication: Streaming and Logical

Database replication is one of the most important techniques for ensuring high availability, scalability, and data durability. PostgreSQL offers two primary replication methods: streaming replication and logical replication. Understanding the differences between them is crucial for designing a robust database architecture.

What is Database Replication?

Database replication is the process of copying data from a primary database to one or more secondary databases. The primary database is the source of truth, while the secondary databases maintain copies of the data. This architecture provides several benefits:

High Availability: If the primary database fails, a secondary can take over, minimizing downtime.
Read Scalability: Secondary databases can handle read queries, offloading the primary.
Data Durability: Multiple copies of data protect against data loss from hardware failures.
Disaster Recovery: Secondary databases can be used for backup and recovery purposes.

Streaming Replication: The Basics

Streaming replication is the most common form of replication in PostgreSQL. It's a physical replication method that copies the actual data files from the primary database to the standby server.

How Streaming Replication Works

Streaming replication works by continuously copying the write-ahead log (WAL) from the primary database to the standby server. The WAL contains all changes made to the database, and the standby server applies these changes in real-time.

When a transaction is committed on the primary, the WAL records the changes, and the standby server reads and applies these changes. This happens continuously, keeping the standby server synchronized with the primary.

Key Characteristics

Synchronous vs Asynchronous: In synchronous replication, the primary waits for the standby to acknowledge receipt of WAL records before committing. In asynchronous replication, the primary commits immediately and the standby catches up in the background.
One-Way Replication: Streaming replication is typically one-way, from primary to standby. Standby servers cannot accept write operations.
Physical Copy: The standby server receives the actual data files, making it a near-exact copy of the primary.

Configuration Example

Here's how to configure streaming replication in PostgreSQL.

Primary Server Configuration (postgresql.conf):

# Enable replication
wal_level = replica
max_wal_senders = 10
max_replication_slots = 10
wal_keep_size = 1GB
 
# Listen on all interfaces
listen_addresses = '*'
 
# Authentication for replication user
# pg_hba.conf
host replication replicator 192.168.1.100/32 md5

Standby Server Configuration (postgresql.conf):

# Set the primary server address
primary_conninfo = 'host=192.168.1.50 user=replicator password=secret'
 
# Enable hot standby
hot_standby = on

Create Replication User (on primary):

CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'secret';

Start Standby Server:

pg_basebackup -h 192.168.1.50 -D /var/lib/postgresql/data -U replicator -P -v -R

The -R flag automatically creates the standby.signal file and configures the connection information.

Logical Replication: The Basics

Logical replication is a different approach that replicates individual transactions rather than the entire database. It's a row-based replication method that allows you to replicate specific tables or even specific rows.

How Logical Replication Works

Logical replication works by capturing changes at the transaction level and applying them to the target database. The primary database sends changes to a replication slot, and the standby server reads and applies these changes.

Unlike streaming replication, logical replication can be configured to replicate only specific tables or even specific rows. It also supports bidirectional replication in some configurations.

Key Characteristics

Table-Level Control: You can choose which tables to replicate.
Row-Level Granularity: You can replicate specific rows based on conditions.
Bidirectional Support: Some configurations allow both primary and standby to accept write operations.
Logical Decoding: Uses logical decoding to extract changes from the WAL.

Configuration Example

Enable Logical Replication (on primary):

-- Enable logical decoding
wal_level = logical
max_replication_slots = 10
max_wal_senders = 10
 
-- Create publication
CREATE PUBLICATION my_publication FOR TABLE users, orders;

Create Subscription (on standby):

CREATE SUBSCRIPTION my_subscription
CONNECTION 'host=192.168.1.50 user=replicator password=secret dbname=postgres'
PUBLICATION my_publication;

Streaming vs Logical Replication: Comparison

Understanding the differences between these two methods is essential for choosing the right approach for your use case.

Feature	Streaming Replication	Logical Replication
Replication Type	Physical	Logical
Granularity	Database-level	Table-level
Synchronization	Near real-time	Near real-time
Write Support	Read-only on standby	Can be read-write
Configuration	Simpler	More complex
Performance	Higher for bulk data	Lower for bulk data
Use Case	High availability	Selective replication

Choosing the Right Replication Method

Use Streaming Replication When:

You need high availability and failover
You want a simple, reliable replication setup
You need near real-time synchronization
You're replicating the entire database

Use Logical Replication When:

You need to replicate only specific tables
You want to avoid replicating large tables
You need bidirectional replication
You want to replicate to different database systems

Practical Example: Setting Up Streaming Replication

Let's walk through a complete example of setting up streaming replication for high availability.

Step 1: Prepare the Primary Server

# Install PostgreSQL
sudo apt-get install postgresql
 
# Edit postgresql.conf
sudo nano /etc/postgresql/14/main/postgresql.conf
 
# Add these settings:
wal_level = replica
max_wal_senders = 10
max_replication_slots = 10
wal_keep_size = 1GB
listen_addresses = '*'
 
# Edit pg_hba.conf
sudo nano /etc/postgresql/14/main/pg_hba.conf
 
# Add this line:
host replication replicator 192.168.1.100/32 md5
 
# Restart PostgreSQL
sudo systemctl restart postgresql

Step 2: Create Replication User

sudo -u postgres psql
 
CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'strong_password';

Step 3: Create Standby Server

# On the standby server
pg_basebackup -h 192.168.1.50 -D /var/lib/postgresql/14/main -U replicator -P -v -R

Step 4: Start Standby Server

sudo systemctl start postgresql

Step 5: Verify Replication

-- On the primary server
SELECT * FROM pg_stat_replication;
 
-- On the standby server
SELECT * FROM pg_stat_wal_receiver;

Monitoring Replication

Monitoring is crucial for ensuring your replication setup is working correctly.

Check Replication Status:

-- On primary
SELECT * FROM pg_stat_replication;
 
-- On standby
SELECT * FROM pg_stat_wal_receiver;

Check Lag:

-- On primary
SELECT
  client_addr,
  state,
  sync_state,
  replay_lag
FROM pg_stat_replication;

Check Replication Slot Status:

SELECT * FROM pg_replication_slots;

Common Issues and Solutions

Replication Lag

If you notice replication lag, consider:

Increasing wal_keep_size
Reducing the number of concurrent transactions
Optimizing query performance on the primary
Using synchronous replication if appropriate

Connection Failures

If you can't connect to the primary for replication:

Check firewall rules
Verify listen_addresses is set correctly
Ensure the replication user exists
Check pg_hba.conf for correct authentication rules

Standby Not Starting

If the standby server won't start:

Check the standby.signal file exists
Verify the primary_conninfo is correct
Ensure the primary is running and accepting connections
Check the pg_basebackup output for errors

Best Practices

Test Failover: Regularly test your failover process to ensure it works when needed.
Monitor Continuously: Set up alerts for replication lag, connection failures, and other issues.
Use Synchronous Replication for Critical Data: For the most important data, use synchronous replication to ensure data consistency.
Keep Replication Slots: Don't drop replication slots unless you're sure you won't need them.
Regular Backups: Even with replication, maintain regular backups as a safety net.

Conclusion

PostgreSQL offers powerful replication capabilities through both streaming and logical replication. Streaming replication is ideal for high availability and disaster recovery, while logical replication provides flexibility for selective data replication.

By understanding the differences between these methods and implementing the right approach for your use case, you can build a robust database architecture that provides high availability, scalability, and data durability.

For teams looking to simplify database management, ServerlessBase provides managed PostgreSQL solutions with built-in replication and high availability features, handling the complex infrastructure so you can focus on your application.

Next Steps

After mastering PostgreSQL replication, you might want to explore:

PostgreSQL High Availability - Building a complete high availability setup with multiple standbys
PostgreSQL Backup Strategies - Complementing replication with regular backups
PostgreSQL Performance Tuning - Optimizing your database for high performance