ServerlessBase Blog
  • PostgreSQL Replication: Streaming and Logical

    A comprehensive guide to PostgreSQL replication methods including streaming and logical replication for high availability and scalability

    PostgreSQL Replication: Streaming and Logical

    Database replication is one of the most important techniques for ensuring high availability, scalability, and data durability. PostgreSQL offers two primary replication methods: streaming replication and logical replication. Understanding the differences between them is crucial for designing a robust database architecture.

    What is Database Replication?

    Database replication is the process of copying data from a primary database to one or more secondary databases. The primary database is the source of truth, while the secondary databases maintain copies of the data. This architecture provides several benefits:

    • High Availability: If the primary database fails, a secondary can take over, minimizing downtime.
    • Read Scalability: Secondary databases can handle read queries, offloading the primary.
    • Data Durability: Multiple copies of data protect against data loss from hardware failures.
    • Disaster Recovery: Secondary databases can be used for backup and recovery purposes.

    Streaming Replication: The Basics

    Streaming replication is the most common form of replication in PostgreSQL. It's a physical replication method that copies the actual data files from the primary database to the standby server.

    How Streaming Replication Works

    Streaming replication works by continuously copying the write-ahead log (WAL) from the primary database to the standby server. The WAL contains all changes made to the database, and the standby server applies these changes in real-time.

    When a transaction is committed on the primary, the WAL records the changes, and the standby server reads and applies these changes. This happens continuously, keeping the standby server synchronized with the primary.

    Key Characteristics

    • Synchronous vs Asynchronous: In synchronous replication, the primary waits for the standby to acknowledge receipt of WAL records before committing. In asynchronous replication, the primary commits immediately and the standby catches up in the background.
    • One-Way Replication: Streaming replication is typically one-way, from primary to standby. Standby servers cannot accept write operations.
    • Physical Copy: The standby server receives the actual data files, making it a near-exact copy of the primary.

    Configuration Example

    Here's how to configure streaming replication in PostgreSQL.

    Primary Server Configuration (postgresql.conf):

    # Enable replication
    wal_level = replica
    max_wal_senders = 10
    max_replication_slots = 10
    wal_keep_size = 1GB
     
    # Listen on all interfaces
    listen_addresses = '*'
     
    # Authentication for replication user
    # pg_hba.conf
    host replication replicator 192.168.1.100/32 md5

    Standby Server Configuration (postgresql.conf):

    # Set the primary server address
    primary_conninfo = 'host=192.168.1.50 user=replicator password=secret'
     
    # Enable hot standby
    hot_standby = on

    Create Replication User (on primary):

    CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'secret';

    Start Standby Server:

    pg_basebackup -h 192.168.1.50 -D /var/lib/postgresql/data -U replicator -P -v -R

    The -R flag automatically creates the standby.signal file and configures the connection information.

    Logical Replication: The Basics

    Logical replication is a different approach that replicates individual transactions rather than the entire database. It's a row-based replication method that allows you to replicate specific tables or even specific rows.

    How Logical Replication Works

    Logical replication works by capturing changes at the transaction level and applying them to the target database. The primary database sends changes to a replication slot, and the standby server reads and applies these changes.

    Unlike streaming replication, logical replication can be configured to replicate only specific tables or even specific rows. It also supports bidirectional replication in some configurations.

    Key Characteristics

    • Table-Level Control: You can choose which tables to replicate.
    • Row-Level Granularity: You can replicate specific rows based on conditions.
    • Bidirectional Support: Some configurations allow both primary and standby to accept write operations.
    • Logical Decoding: Uses logical decoding to extract changes from the WAL.

    Configuration Example

    Enable Logical Replication (on primary):

    -- Enable logical decoding
    wal_level = logical
    max_replication_slots = 10
    max_wal_senders = 10
     
    -- Create publication
    CREATE PUBLICATION my_publication FOR TABLE users, orders;

    Create Subscription (on standby):

    CREATE SUBSCRIPTION my_subscription
    CONNECTION 'host=192.168.1.50 user=replicator password=secret dbname=postgres'
    PUBLICATION my_publication;

    Streaming vs Logical Replication: Comparison

    Understanding the differences between these two methods is essential for choosing the right approach for your use case.

    FeatureStreaming ReplicationLogical Replication
    Replication TypePhysicalLogical
    GranularityDatabase-levelTable-level
    SynchronizationNear real-timeNear real-time
    Write SupportRead-only on standbyCan be read-write
    ConfigurationSimplerMore complex
    PerformanceHigher for bulk dataLower for bulk data
    Use CaseHigh availabilitySelective replication

    Choosing the Right Replication Method

    Use Streaming Replication When:

    • You need high availability and failover
    • You want a simple, reliable replication setup
    • You need near real-time synchronization
    • You're replicating the entire database

    Use Logical Replication When:

    • You need to replicate only specific tables
    • You want to avoid replicating large tables
    • You need bidirectional replication
    • You want to replicate to different database systems

    Practical Example: Setting Up Streaming Replication

    Let's walk through a complete example of setting up streaming replication for high availability.

    Step 1: Prepare the Primary Server

    # Install PostgreSQL
    sudo apt-get install postgresql
     
    # Edit postgresql.conf
    sudo nano /etc/postgresql/14/main/postgresql.conf
     
    # Add these settings:
    wal_level = replica
    max_wal_senders = 10
    max_replication_slots = 10
    wal_keep_size = 1GB
    listen_addresses = '*'
     
    # Edit pg_hba.conf
    sudo nano /etc/postgresql/14/main/pg_hba.conf
     
    # Add this line:
    host replication replicator 192.168.1.100/32 md5
     
    # Restart PostgreSQL
    sudo systemctl restart postgresql

    Step 2: Create Replication User

    sudo -u postgres psql
     
    CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'strong_password';

    Step 3: Create Standby Server

    # On the standby server
    pg_basebackup -h 192.168.1.50 -D /var/lib/postgresql/14/main -U replicator -P -v -R

    Step 4: Start Standby Server

    sudo systemctl start postgresql

    Step 5: Verify Replication

    -- On the primary server
    SELECT * FROM pg_stat_replication;
     
    -- On the standby server
    SELECT * FROM pg_stat_wal_receiver;

    Monitoring Replication

    Monitoring is crucial for ensuring your replication setup is working correctly.

    Check Replication Status:

    -- On primary
    SELECT * FROM pg_stat_replication;
     
    -- On standby
    SELECT * FROM pg_stat_wal_receiver;

    Check Lag:

    -- On primary
    SELECT
      client_addr,
      state,
      sync_state,
      replay_lag
    FROM pg_stat_replication;

    Check Replication Slot Status:

    SELECT * FROM pg_replication_slots;

    Common Issues and Solutions

    Replication Lag

    If you notice replication lag, consider:

    • Increasing wal_keep_size
    • Reducing the number of concurrent transactions
    • Optimizing query performance on the primary
    • Using synchronous replication if appropriate

    Connection Failures

    If you can't connect to the primary for replication:

    • Check firewall rules
    • Verify listen_addresses is set correctly
    • Ensure the replication user exists
    • Check pg_hba.conf for correct authentication rules

    Standby Not Starting

    If the standby server won't start:

    • Check the standby.signal file exists
    • Verify the primary_conninfo is correct
    • Ensure the primary is running and accepting connections
    • Check the pg_basebackup output for errors

    Best Practices

    1. Test Failover: Regularly test your failover process to ensure it works when needed.
    2. Monitor Continuously: Set up alerts for replication lag, connection failures, and other issues.
    3. Use Synchronous Replication for Critical Data: For the most important data, use synchronous replication to ensure data consistency.
    4. Keep Replication Slots: Don't drop replication slots unless you're sure you won't need them.
    5. Regular Backups: Even with replication, maintain regular backups as a safety net.

    Conclusion

    PostgreSQL offers powerful replication capabilities through both streaming and logical replication. Streaming replication is ideal for high availability and disaster recovery, while logical replication provides flexibility for selective data replication.

    By understanding the differences between these methods and implementing the right approach for your use case, you can build a robust database architecture that provides high availability, scalability, and data durability.

    For teams looking to simplify database management, ServerlessBase provides managed PostgreSQL solutions with built-in replication and high availability features, handling the complex infrastructure so you can focus on your application.


    Next Steps

    After mastering PostgreSQL replication, you might want to explore:

    Leave comment