Back
December 18, 2023
6
min read

Logical Replication From Postgres 16 Stand-By Servers—Debezium and Failover Slots

By
Gunnar Morling
Share this post

Welcome back to this series about logical replication from Postgres 16 stand-by servers, in which we’ll discuss how to use this feature with Debezium—a popular open-source platform for Change Data Capture (CDC) for a wide range of databases—as well as how to manage logical replication in case of failover scenarios, i.e. a situation where your primary Postgres server becomes unavailable and a stand-by server needs to take over.

If you want to learn more about logical replication in general, and why and how to use it with stand-by servers in Postgres 16, then I suggest heading over to part one of this blog series before continuing here.

Stand-By Logical Replication With Debezium

The beauty of Postgres logical replication is that not only other Postgres instances can serve as the consumer of a replication stream, but that also other clients can subscribe to a replication slot for ingesting realtime change event feeds from Postgres. Debezium, offering CDC support for many databases, including Postgres, makes use of that for exposing change event streams via Apache Kafka, but also—via its Debezium Server component—to other kinds of messaging and streaming platforms such as AWS Kinesis, Apache Pulsar, NATS, and others. So let’s quickly test how to stream changes from a Postgres stand-by with Debezium.

Note that we’ll need to use the latest Debezium release—2.5.0.Beta 1, released last week—in order to stream changes from a stand-by server. When I first had tested this, things wouldn’t quite work, as the connector made use of the function <span class="inline-code">pg_current_wal_lsn()</span> in order to obtain the current WAL position. This is only available on primary servers, though. So I took the opportunity to make my first little Debezium contribution for quite a while, changing it to invoke <span class="inline-code">pg_last_wal_receive_lsn()</span> instead when connecting against a stand-by. Thanks a lot to the team for the quick merge and inclusion in the Beta1 release!

As a playground for this experiment, I’ve created a simple Docker Compose file which launches Kafka, and Kafka Connect as the runtime environment for Debezium. Here’s an overview of all the involved components:

Fig. 1: Solution Overview (click to enlarge)

Fun fact: this uses Kafka in KRaft mode, i.e. no ZooKeeper process is needed. Good times! If you want to follow along, make sure you have Docker installed and you have a Postgres primary and stand-by node set up on Amazon RDS as described in part one. Then clone the decodable/examples repository and launch the demo environment like so:

git clone git@github.com:decodableco/examples.git decodable-examples
cd decodable-examples/postgres-logical-replication-standby
docker compose up

In order to use Debezium with Postgres on RDS, it is recommended to use the <span class="inline-code">pgoutput</span> logical decoding plug-in. It is the standard decoding plug-in, also used for logical replication to other Postgres instances. This plug-in requires a publication to be set up, which configures which kinds of changes should be published for which tables. Usually, Debezium will set up the publication—similar to the logical replication slot—automatically. Unfortunately, this is not supported when ingesting changes from a stand-by server, as publications (unlike replication slots) can only be created on a primary server. Debezium doesn’t know about the primary, so you’ll need to create that publication manually before setting up the connector:

primary> CREATE PUBLICATION my_publication FOR ALL TABLES;
CREATE PUBLICATION

Having to create publications for stand-by replication slots on the primary seems somewhat inconsistent and it’s also not ideal in terms of operations, but there may be a good reason for that requirement.

<div class="side-note">Note that instead of the <span class="inline-code">ALL TABLES</span> publication, you could also narrow this further down and only expose the changes for specific tables, omit certain columns (e.g. with PII data) or rows (e.g. logically deleted rows), and more. See the docs for the <span class="inline-code">CREATE PUBLICATION</span> command for more details.</div>

Let’s take a look at the connector configuration then. This is done via a JSON-based configuration file looking like this:

{
 "name" : "inventory-connector",
 "config" : {
   "connector.class" : "io.debezium.connector.postgresql.PostgresConnector",
   "tasks.max" : "1",
   "database.hostname" : "<Your Postgres Stand-By>",
   "database.port" : "5432",
   "database.user" : "<User>",
   "database.password" : "<Password>",
   "database.dbname" : "<Database>",
   "topic.prefix" : "dbserver1",
   "plugin.name" : "pgoutput",
   "publication.name" : "my_publication",
   "poll.interval.ms" : "100"
 }
}

Adjust database host, user name, password, and database name as needed when applying this to your own environment. To apply this configuration, the REST API of Kafka Connect can be invoked. But if you’re like me and tend to forget all the exact endpoint URLs, then take a look at kcctl 🧸 (yes, the teddy bear emoji is part of the name), which I am going to use in the following. It is  a command line client for Kafka Connect, which makes it very easy to create connectors, restart and stop them, etc. Following the semantics of kubectl, a configuration file is applied like this:

kcctl apply -f postgres-connector.json
Created connector inventory-connector

Let’s take a look at the connector and its status:

kcctl describe connector inventory-connector
Name:       inventory-connector
Type:       source
State:      RUNNING
Worker ID:  172.21.0.3:8083
Config:
  connector.class:    io.debezium.connector.postgresql.PostgresConnector
  database.dbname:    <Database>
  database.hostname:  <Host>
  database.password:  <Password>
  database.port:      5432
  database.user:      <User>
  name:               inventory-connector
  plugin.name:        pgoutput
  poll.interval.ms:   100
  publication.name:   my_publication
  tasks.max:          1
  topic.prefix:       dbserver1
Tasks:
  0:
    State:      RUNNING
    Worker ID:  172.21.0.3:8083
Topics:
  dbserver1.public.some_data

Having confirmed that the connector is running, let’s do a quick update in the primary database and examine the corresponding change events in Kafka, ingested from the stand-by instance:

docker run --tty --rm \
     --network postgres-logical-replication-standby_default \
     quay.io/debezium/tooling:1.2 \
     kcat -b kafka:9092 -C -o beginning -q -t dbserver1.public.some_data | jq .payload
{
  "before": null,
  "after": {
    "id": 1,
    "short_text": "c4ca4",
    "long_text": "3a3c3274941c83e253ebf8d2438ea5a2"
  },
  "source": {
    "version": "2.5.0.Beta1",
    "connector": "postgresql",
    "name": "dbserver1",
    "ts_ms": 1702469846436,
    "snapshot": "first_in_data_collection",
    "db": "inventory",
    "sequence": "[null,\"406746824704\"]",
    "schema": "public",
    "table": "some_data",
    "txId": null,
    "lsn": 406746824704,
    "xmin": null
  },
  "op": "r",
  "ts_ms": 1702469849341,
  "transaction": null
}
...

These are the snapshot events emitted by the connector when starting up. Let’s do an update to one of the records on the primary:

primary> UPDATE some_data SET short_text='hello' WHERE id = 1;

And shortly thereafter, the corresponding change event should show up in the Kafka topic:

...
  "before": {
    "id": 1,
    "short_text": "c4ca4",
    "long_text": "3a3c3274941c83e253ebf8d2438ea5a2"
  },
  "after": {
    "id": 1,
    "short_text": "hello",I
    "long_text": "3a3c3274941c83e253ebf8d2438ea5a2"
  },
...

At this point, you could hook up that change event stream with Apache Flink, or the Decodable Kafka source connector for feeding it into a real-time stream processing pipeline, but I’ll leave that for another day 🙂.

Towards Fail-Over Slots

Postgres’ support for logical replication has been built out significantly over the last few years. One thing still is missing, though: failover slots. Logical replication slots on the primary are not propagated to stand-bys. This means that when the primary unexpectedly goes down, any slots must be recreated on the new primary after promotion. Unfortunately, this can cause gaps in the change event stream, as any data change occurring before the new slot has been created would be missed. Clients would be forced to backfill the entire data set (i.e. take a snapshot in Debezium terminology) to be sure that no data is missing.

Discussions around adding support for fail-over slots go back to Postgres versions as old as 9.6. More recently, Patroni added their own solution to the problem, and EnterpriseDB released pg_failover_slots, a Postgres extension for slot failover. It remains to be seen when Postgres itself adds this feature (as hinted at in this presentation, it may happen with Postgres 17). Until then, in cases where the pg_failover_slots extension isn’t available—such as on Amazon RDS—logical replication slots on stand-bys let you build your own version of failover slots. The idea is to create two corresponding slots on primary and stand-by, and use the <span class="inline-code">pg_replication_slot_advance()</span> function (added in Postgres 11) to keep the two in sync. The replication consumer would connect to the slot on the primary at first. After a fail-over, when the stand-by server has been promoted to primary, it would reconnect to that slot.

For this to work, it is critical to periodically move the stand-by slot forward by calling <span class="inline-code">pg_replication_slot_advance()</span> with the confirmed flush LSN from the primary, i.e. the latest position in the WAL which has been processed and acknowledged by the consumer of the primary slot. Otherwise, the stand-by slot would retain larger and larger amounts of WAL, also the consumer would receive lots of duplicated events after a fail-over.

This could for instance be implemented using a cron job, or, when running on AWS, with a scheduled Lambda function. This job would periodically retrieve the confirmed flush LSN for the slot on the primary via the <span class="inline-code">pg_replication_slots</span> view:

primary> SELECT
  slot_name,
  confirmed_flush_lsn
FROM pg_replication_slots;
+--------------+---------------------+
| slot_name    | confirmed_flush_lsn |
|--------------+---------------------|
| primary_slot | 5F/C501098          |
+--------------+---------------------+

The slot on the stand-by would then be advanced to that LSN:

standby> SELECT * FROM pg_replication_slot_advance('failover_slot', '5F/C501098');
+---------------+------------+
| slot_name     | end_lsn    |
|---------------+------------|
| failover_slot | 5F/C501098 |
+---------------+------------+

How often you should advance the stand-by slot depends on the amount of duplication you are willing to accept after a failover: the closer the stand-by slot follows the primary slot, the fewer duplicates there will be when switching from one slot to the other. Note that the stand-by slot must never be advanced beyond the confirmed LSN of the primary slot. Otherwise, events would be lost when reading from the stand-by slot after a failover. Specifically, when setting up the stand-by slot, it will in all likelihood be on a newer LSN than what has been confirmed by the primary and it is vital to synchronize the two slots at first. To do so, wait for the next LSN to be confirmed by the primary slot, make sure this LSN has been replicated to the stand-by, and then advance the stand-by slot to that LSN.

Wrap-Up

Logical replication from Postgres stand-by servers has been a long awaited functionality, and it finally shipped with Postgres 16. Not only does it allow you to build chains of Postgres replicas (one stand-by server subscribing to another), but also non-Postgres clients, such as Debezium are not limited any longer to solely being able to connect to primary Postgres instances. This can be very useful for the purposes of load distribution or in situations where you prefer a CDC tool not to connect directly to your primary database.

The last missing piece in the puzzle here is full support for fail-over slots, for which you still need either a separate extension (pg_failover_slots) or implement your own approach by manually keeping two slots on primary and stand-by in sync. It would be great to see official support for this in a future Postgres release.

Finally, if you’d like learn more about logical replication from stand-bys, check out these posts from some fine folks in the Postgres community:

Many thanks to Bertrand Drouvot, Robert Metzger, Robin Moffatt, Gwen Shapira, and Sharon Xie for their feedback while writing this post.

📫 Email signup 👇

Did you enjoy this issue of Checkpoint Chronicle? Would you like the next edition delivered directly to your email to read from the comfort of your own home?

Simply enter your email address here and we'll send you the next issue as soon as it's published—and nothing else, we promise!

👍 Got it!
Oops! Something went wrong while submitting the form.
Gunnar Morling

Gunnar is an open-source enthusiast at heart, currently working on Apache Flink-based stream processing. In his prior role as a software engineer at Red Hat, he led the Debezium project, a distributed platform for change data capture. He is a Java Champion and has founded multiple open source projects such as JfrUnit, kcctl, and MapStruct.

Table of contents

Let's Get Decoding

Welcome back to this series about logical replication from Postgres 16 stand-by servers, in which we’ll discuss how to use this feature with Debezium—a popular open-source platform for Change Data Capture (CDC) for a wide range of databases—as well as how to manage logical replication in case of failover scenarios, i.e. a situation where your primary Postgres server becomes unavailable and a stand-by server needs to take over.

If you want to learn more about logical replication in general, and why and how to use it with stand-by servers in Postgres 16, then I suggest heading over to part one of this blog series before continuing here.

Stand-By Logical Replication With Debezium

The beauty of Postgres logical replication is that not only other Postgres instances can serve as the consumer of a replication stream, but that also other clients can subscribe to a replication slot for ingesting realtime change event feeds from Postgres. Debezium, offering CDC support for many databases, including Postgres, makes use of that for exposing change event streams via Apache Kafka, but also—via its Debezium Server component—to other kinds of messaging and streaming platforms such as AWS Kinesis, Apache Pulsar, NATS, and others. So let’s quickly test how to stream changes from a Postgres stand-by with Debezium.

Note that we’ll need to use the latest Debezium release—2.5.0.Beta 1, released last week—in order to stream changes from a stand-by server. When I first had tested this, things wouldn’t quite work, as the connector made use of the function <span class="inline-code">pg_current_wal_lsn()</span> in order to obtain the current WAL position. This is only available on primary servers, though. So I took the opportunity to make my first little Debezium contribution for quite a while, changing it to invoke <span class="inline-code">pg_last_wal_receive_lsn()</span> instead when connecting against a stand-by. Thanks a lot to the team for the quick merge and inclusion in the Beta1 release!

As a playground for this experiment, I’ve created a simple Docker Compose file which launches Kafka, and Kafka Connect as the runtime environment for Debezium. Here’s an overview of all the involved components:

Fig. 1: Solution Overview (click to enlarge)

Fun fact: this uses Kafka in KRaft mode, i.e. no ZooKeeper process is needed. Good times! If you want to follow along, make sure you have Docker installed and you have a Postgres primary and stand-by node set up on Amazon RDS as described in part one. Then clone the decodable/examples repository and launch the demo environment like so:

git clone git@github.com:decodableco/examples.git decodable-examples
cd decodable-examples/postgres-logical-replication-standby
docker compose up

In order to use Debezium with Postgres on RDS, it is recommended to use the <span class="inline-code">pgoutput</span> logical decoding plug-in. It is the standard decoding plug-in, also used for logical replication to other Postgres instances. This plug-in requires a publication to be set up, which configures which kinds of changes should be published for which tables. Usually, Debezium will set up the publication—similar to the logical replication slot—automatically. Unfortunately, this is not supported when ingesting changes from a stand-by server, as publications (unlike replication slots) can only be created on a primary server. Debezium doesn’t know about the primary, so you’ll need to create that publication manually before setting up the connector:

primary> CREATE PUBLICATION my_publication FOR ALL TABLES;
CREATE PUBLICATION

Having to create publications for stand-by replication slots on the primary seems somewhat inconsistent and it’s also not ideal in terms of operations, but there may be a good reason for that requirement.

<div class="side-note">Note that instead of the <span class="inline-code">ALL TABLES</span> publication, you could also narrow this further down and only expose the changes for specific tables, omit certain columns (e.g. with PII data) or rows (e.g. logically deleted rows), and more. See the docs for the <span class="inline-code">CREATE PUBLICATION</span> command for more details.</div>

Let’s take a look at the connector configuration then. This is done via a JSON-based configuration file looking like this:

{
 "name" : "inventory-connector",
 "config" : {
   "connector.class" : "io.debezium.connector.postgresql.PostgresConnector",
   "tasks.max" : "1",
   "database.hostname" : "<Your Postgres Stand-By>",
   "database.port" : "5432",
   "database.user" : "<User>",
   "database.password" : "<Password>",
   "database.dbname" : "<Database>",
   "topic.prefix" : "dbserver1",
   "plugin.name" : "pgoutput",
   "publication.name" : "my_publication",
   "poll.interval.ms" : "100"
 }
}

Adjust database host, user name, password, and database name as needed when applying this to your own environment. To apply this configuration, the REST API of Kafka Connect can be invoked. But if you’re like me and tend to forget all the exact endpoint URLs, then take a look at kcctl 🧸 (yes, the teddy bear emoji is part of the name), which I am going to use in the following. It is  a command line client for Kafka Connect, which makes it very easy to create connectors, restart and stop them, etc. Following the semantics of kubectl, a configuration file is applied like this:

kcctl apply -f postgres-connector.json
Created connector inventory-connector

Let’s take a look at the connector and its status:

kcctl describe connector inventory-connector
Name:       inventory-connector
Type:       source
State:      RUNNING
Worker ID:  172.21.0.3:8083
Config:
  connector.class:    io.debezium.connector.postgresql.PostgresConnector
  database.dbname:    <Database>
  database.hostname:  <Host>
  database.password:  <Password>
  database.port:      5432
  database.user:      <User>
  name:               inventory-connector
  plugin.name:        pgoutput
  poll.interval.ms:   100
  publication.name:   my_publication
  tasks.max:          1
  topic.prefix:       dbserver1
Tasks:
  0:
    State:      RUNNING
    Worker ID:  172.21.0.3:8083
Topics:
  dbserver1.public.some_data

Having confirmed that the connector is running, let’s do a quick update in the primary database and examine the corresponding change events in Kafka, ingested from the stand-by instance:

docker run --tty --rm \
     --network postgres-logical-replication-standby_default \
     quay.io/debezium/tooling:1.2 \
     kcat -b kafka:9092 -C -o beginning -q -t dbserver1.public.some_data | jq .payload
{
  "before": null,
  "after": {
    "id": 1,
    "short_text": "c4ca4",
    "long_text": "3a3c3274941c83e253ebf8d2438ea5a2"
  },
  "source": {
    "version": "2.5.0.Beta1",
    "connector": "postgresql",
    "name": "dbserver1",
    "ts_ms": 1702469846436,
    "snapshot": "first_in_data_collection",
    "db": "inventory",
    "sequence": "[null,\"406746824704\"]",
    "schema": "public",
    "table": "some_data",
    "txId": null,
    "lsn": 406746824704,
    "xmin": null
  },
  "op": "r",
  "ts_ms": 1702469849341,
  "transaction": null
}
...

These are the snapshot events emitted by the connector when starting up. Let’s do an update to one of the records on the primary:

primary> UPDATE some_data SET short_text='hello' WHERE id = 1;

And shortly thereafter, the corresponding change event should show up in the Kafka topic:

...
  "before": {
    "id": 1,
    "short_text": "c4ca4",
    "long_text": "3a3c3274941c83e253ebf8d2438ea5a2"
  },
  "after": {
    "id": 1,
    "short_text": "hello",I
    "long_text": "3a3c3274941c83e253ebf8d2438ea5a2"
  },
...

At this point, you could hook up that change event stream with Apache Flink, or the Decodable Kafka source connector for feeding it into a real-time stream processing pipeline, but I’ll leave that for another day 🙂.

Towards Fail-Over Slots

Postgres’ support for logical replication has been built out significantly over the last few years. One thing still is missing, though: failover slots. Logical replication slots on the primary are not propagated to stand-bys. This means that when the primary unexpectedly goes down, any slots must be recreated on the new primary after promotion. Unfortunately, this can cause gaps in the change event stream, as any data change occurring before the new slot has been created would be missed. Clients would be forced to backfill the entire data set (i.e. take a snapshot in Debezium terminology) to be sure that no data is missing.

Discussions around adding support for fail-over slots go back to Postgres versions as old as 9.6. More recently, Patroni added their own solution to the problem, and EnterpriseDB released pg_failover_slots, a Postgres extension for slot failover. It remains to be seen when Postgres itself adds this feature (as hinted at in this presentation, it may happen with Postgres 17). Until then, in cases where the pg_failover_slots extension isn’t available—such as on Amazon RDS—logical replication slots on stand-bys let you build your own version of failover slots. The idea is to create two corresponding slots on primary and stand-by, and use the <span class="inline-code">pg_replication_slot_advance()</span> function (added in Postgres 11) to keep the two in sync. The replication consumer would connect to the slot on the primary at first. After a fail-over, when the stand-by server has been promoted to primary, it would reconnect to that slot.

For this to work, it is critical to periodically move the stand-by slot forward by calling <span class="inline-code">pg_replication_slot_advance()</span> with the confirmed flush LSN from the primary, i.e. the latest position in the WAL which has been processed and acknowledged by the consumer of the primary slot. Otherwise, the stand-by slot would retain larger and larger amounts of WAL, also the consumer would receive lots of duplicated events after a fail-over.

This could for instance be implemented using a cron job, or, when running on AWS, with a scheduled Lambda function. This job would periodically retrieve the confirmed flush LSN for the slot on the primary via the <span class="inline-code">pg_replication_slots</span> view:

primary> SELECT
  slot_name,
  confirmed_flush_lsn
FROM pg_replication_slots;
+--------------+---------------------+
| slot_name    | confirmed_flush_lsn |
|--------------+---------------------|
| primary_slot | 5F/C501098          |
+--------------+---------------------+

The slot on the stand-by would then be advanced to that LSN:

standby> SELECT * FROM pg_replication_slot_advance('failover_slot', '5F/C501098');
+---------------+------------+
| slot_name     | end_lsn    |
|---------------+------------|
| failover_slot | 5F/C501098 |
+---------------+------------+

How often you should advance the stand-by slot depends on the amount of duplication you are willing to accept after a failover: the closer the stand-by slot follows the primary slot, the fewer duplicates there will be when switching from one slot to the other. Note that the stand-by slot must never be advanced beyond the confirmed LSN of the primary slot. Otherwise, events would be lost when reading from the stand-by slot after a failover. Specifically, when setting up the stand-by slot, it will in all likelihood be on a newer LSN than what has been confirmed by the primary and it is vital to synchronize the two slots at first. To do so, wait for the next LSN to be confirmed by the primary slot, make sure this LSN has been replicated to the stand-by, and then advance the stand-by slot to that LSN.

Wrap-Up

Logical replication from Postgres stand-by servers has been a long awaited functionality, and it finally shipped with Postgres 16. Not only does it allow you to build chains of Postgres replicas (one stand-by server subscribing to another), but also non-Postgres clients, such as Debezium are not limited any longer to solely being able to connect to primary Postgres instances. This can be very useful for the purposes of load distribution or in situations where you prefer a CDC tool not to connect directly to your primary database.

The last missing piece in the puzzle here is full support for fail-over slots, for which you still need either a separate extension (pg_failover_slots) or implement your own approach by manually keeping two slots on primary and stand-by in sync. It would be great to see official support for this in a future Postgres release.

Finally, if you’d like learn more about logical replication from stand-bys, check out these posts from some fine folks in the Postgres community:

Many thanks to Bertrand Drouvot, Robert Metzger, Robin Moffatt, Gwen Shapira, and Sharon Xie for their feedback while writing this post.

📫 Email signup 👇

Did you enjoy this issue of Checkpoint Chronicle? Would you like the next edition delivered directly to your email to read from the comfort of your own home?

Simply enter your email address here and we'll send you the next issue as soon as it's published—and nothing else, we promise!

👍 Got it!
Oops! Something went wrong while submitting the form.
Gunnar Morling

Gunnar is an open-source enthusiast at heart, currently working on Apache Flink-based stream processing. In his prior role as a software engineer at Red Hat, he led the Debezium project, a distributed platform for change data capture. He is a Java Champion and has founded multiple open source projects such as JfrUnit, kcctl, and MapStruct.

Let's Get Decoding