PostgreSQL Replication: Streaming and Logical
Database replication is one of the most important techniques for ensuring high availability, scalability, and data durability. PostgreSQL offers two primary replication methods: streaming replication and logical replication. Understanding the differences between them is crucial for designing a robust database architecture.
What is Database Replication?
Database replication is the process of copying data from a primary database to one or more secondary databases. The primary database is the source of truth, while the secondary databases maintain copies of the data. This architecture provides several benefits:
- High Availability: If the primary database fails, a secondary can take over, minimizing downtime.
- Read Scalability: Secondary databases can handle read queries, offloading the primary.
- Data Durability: Multiple copies of data protect against data loss from hardware failures.
- Disaster Recovery: Secondary databases can be used for backup and recovery purposes.
Streaming Replication: The Basics
Streaming replication is the most common form of replication in PostgreSQL. It's a physical replication method that copies the actual data files from the primary database to the standby server.
How Streaming Replication Works
Streaming replication works by continuously copying the write-ahead log (WAL) from the primary database to the standby server. The WAL contains all changes made to the database, and the standby server applies these changes in real-time.
When a transaction is committed on the primary, the WAL records the changes, and the standby server reads and applies these changes. This happens continuously, keeping the standby server synchronized with the primary.
Key Characteristics
- Synchronous vs Asynchronous: In synchronous replication, the primary waits for the standby to acknowledge receipt of WAL records before committing. In asynchronous replication, the primary commits immediately and the standby catches up in the background.
- One-Way Replication: Streaming replication is typically one-way, from primary to standby. Standby servers cannot accept write operations.
- Physical Copy: The standby server receives the actual data files, making it a near-exact copy of the primary.
Configuration Example
Here's how to configure streaming replication in PostgreSQL.
Primary Server Configuration (postgresql.conf):
Standby Server Configuration (postgresql.conf):
Create Replication User (on primary):
Start Standby Server:
The -R flag automatically creates the standby.signal file and configures the connection information.
Logical Replication: The Basics
Logical replication is a different approach that replicates individual transactions rather than the entire database. It's a row-based replication method that allows you to replicate specific tables or even specific rows.
How Logical Replication Works
Logical replication works by capturing changes at the transaction level and applying them to the target database. The primary database sends changes to a replication slot, and the standby server reads and applies these changes.
Unlike streaming replication, logical replication can be configured to replicate only specific tables or even specific rows. It also supports bidirectional replication in some configurations.
Key Characteristics
- Table-Level Control: You can choose which tables to replicate.
- Row-Level Granularity: You can replicate specific rows based on conditions.
- Bidirectional Support: Some configurations allow both primary and standby to accept write operations.
- Logical Decoding: Uses logical decoding to extract changes from the WAL.
Configuration Example
Enable Logical Replication (on primary):
Create Subscription (on standby):
Streaming vs Logical Replication: Comparison
Understanding the differences between these two methods is essential for choosing the right approach for your use case.
| Feature | Streaming Replication | Logical Replication |
|---|---|---|
| Replication Type | Physical | Logical |
| Granularity | Database-level | Table-level |
| Synchronization | Near real-time | Near real-time |
| Write Support | Read-only on standby | Can be read-write |
| Configuration | Simpler | More complex |
| Performance | Higher for bulk data | Lower for bulk data |
| Use Case | High availability | Selective replication |
Choosing the Right Replication Method
Use Streaming Replication When:
- You need high availability and failover
- You want a simple, reliable replication setup
- You need near real-time synchronization
- You're replicating the entire database
Use Logical Replication When:
- You need to replicate only specific tables
- You want to avoid replicating large tables
- You need bidirectional replication
- You want to replicate to different database systems
Practical Example: Setting Up Streaming Replication
Let's walk through a complete example of setting up streaming replication for high availability.
Step 1: Prepare the Primary Server
Step 2: Create Replication User
Step 3: Create Standby Server
Step 4: Start Standby Server
Step 5: Verify Replication
Monitoring Replication
Monitoring is crucial for ensuring your replication setup is working correctly.
Check Replication Status:
Check Lag:
Check Replication Slot Status:
Common Issues and Solutions
Replication Lag
If you notice replication lag, consider:
- Increasing
wal_keep_size - Reducing the number of concurrent transactions
- Optimizing query performance on the primary
- Using synchronous replication if appropriate
Connection Failures
If you can't connect to the primary for replication:
- Check firewall rules
- Verify
listen_addressesis set correctly - Ensure the replication user exists
- Check
pg_hba.conffor correct authentication rules
Standby Not Starting
If the standby server won't start:
- Check the
standby.signalfile exists - Verify the
primary_conninfois correct - Ensure the primary is running and accepting connections
- Check the
pg_basebackupoutput for errors
Best Practices
- Test Failover: Regularly test your failover process to ensure it works when needed.
- Monitor Continuously: Set up alerts for replication lag, connection failures, and other issues.
- Use Synchronous Replication for Critical Data: For the most important data, use synchronous replication to ensure data consistency.
- Keep Replication Slots: Don't drop replication slots unless you're sure you won't need them.
- Regular Backups: Even with replication, maintain regular backups as a safety net.
Conclusion
PostgreSQL offers powerful replication capabilities through both streaming and logical replication. Streaming replication is ideal for high availability and disaster recovery, while logical replication provides flexibility for selective data replication.
By understanding the differences between these methods and implementing the right approach for your use case, you can build a robust database architecture that provides high availability, scalability, and data durability.
For teams looking to simplify database management, ServerlessBase provides managed PostgreSQL solutions with built-in replication and high availability features, handling the complex infrastructure so you can focus on your application.
Next Steps
After mastering PostgreSQL replication, you might want to explore:
- PostgreSQL High Availability - Building a complete high availability setup with multiple standbys
- PostgreSQL Backup Strategies - Complementing replication with regular backups
- PostgreSQL Performance Tuning - Optimizing your database for high performance