Amazon RDS for PostgreSQL now supports replication delay using the recovery_min_apply_delay parameter
I'm Iwasa.
Amazon RDS for PostgreSQL supports a feature to generate read-only instances called read replicas and perform replication.
Meanwhile, there is a standby instance parameter related to PostgreSQL replication called recovery_min_apply_delay
. While standby instances typically replicate data from the primary instance as quickly as possible, specifying this parameter allows you to explicitly delay replication by a certain amount of time.
I wasn't aware of this, but it seems Amazon RDS for PostgreSQL had not supported recovery_min_apply_delay
until now, meaning it wasn't possible to configure delayed replication. However, according to a recent update, this feature is now available.
As for use cases, when faster recovery than traditional PITR (Point-In-Time Recovery) is required, this can be achieved by promoting an instance with delayed replication enabled.## Preparing a read replica and observing default replication behavior
For this feature, it's available in the following minor versions and later, so please make sure you're using a supported minor version:
- 14.19
- 15.14
- 16.10
- 17.6
In this case, I'll try with 17.6, which is the latest minor version of 17 at the time of writing.
First, let's create a primary instance.
Since we're just creating a normal read replica, I'll create a single AZ instance.
Once the primary instance is created, let's create a read replica.
Since both the primary instance and the read replica instance have separate endpoints, I'll connect to each with psql
to verify functionality.
Let's create some tables and records to test.
% psql -h hoge0824psql.cpnu9ipu74g4.ap-northeast-1.rds.amazonaws.com -U postgres -d hogedb
Password for user postgres:
psql (14.15 (Homebrew), server 17.6)
WARNING: psql major version 14, server major version 17.
Some psql features might not work.
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
Type "help" for help.
hogedb=> create table hogetbl (id integer, val varchar(10));
CREATE TABLE
hogedb=> insert into hogetbl values (1, 'a1');
INSERT 0 1
hogedb=> insert into hogetbl values (2, 'a2');
INSERT 0 1
hogedb=> insert into hogetbl values (3, 'a3');
INSERT 0 1
hogedb=>
When I run a query on the read replica side, I can immediately confirm that the records have been replicated.
% psql -h hoge0824psql2.cpnu9ipu74g4.ap-northeast-1.rds.amazonaws.com -U postgres -d hogedb
Password for user postgres:
psql (14.15 (Homebrew), server 17.6)
WARNING: psql major version 14, server major version 17.
Some psql features might not work.
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
Type "help" for help.
hogedb=> select * from hogetbl;
id | val
----+-----
1 | a1
2 | a2
3 | a3
(3 rows)
hogedb=>
```Amazon RDS for PostgreSQL has metrics to observe replication lag, which typically showed replication occurring within about 100-300 ms.

## Parameter Change to Enable Delayed Replication
Now for today's update.
Let's set the `recovery_min_apply_delay` in the parameter group. By default, nothing is set, and the default behavior is equivalent to `0`.

This time I set it to `100000`.
The delay is specified in milliseconds, so this means 100 seconds. You can specify up to `86400000` (24 hours), allowing adjustment according to your RPO requirements.

Since the parameter type is Dynamic, it takes effect immediately without requiring an instance restart.
Reattaching the parameter group would require a restart.
### Behavior During Delayed Replication
Let's add a record after enabling the parameter.
Here is the primary instance:
```bash
hogedb=> insert into hogetbl values (4, 'a4');
INSERT 0 1
hogedb=> select * from hogetbl;
id | val
----+-----
1 | a1
2 | a2
3 | a3
4 | a4
(4 rows)
hogedb=>
Now let's check the read replica instance immediately after:
hogedb=> select * from hogetbl;
id | val
----+-----
1 | a1
2 | a2
3 | a3
(3 rows)
Oh! Unlike before, the changes haven't been replicated yet.
After waiting about 100 seconds, the changes were reflected as follows:
hogedb=> select * from hogetbl;
id | val
----+-----
1 | a1
2 | a2
3 | a3
4 | a4
(4 rows)
ReplicaLag During Delayed Replication
I thought the ReplicaLag metric would increase by 100,000 ms during this delayed replication, but it seems the metric is unrelated.
It showed similar behavior to a normal replica lag as seen below:
It seems we don't need to consider this value when setting up alarms based on replica lag.
Conclusion
Today I tried out the newly supported replication delay using the recovery_min_apply_delay parameter in Amazon RDS for PostgreSQL.
This could be a good option when PITR alone doesn't quite meet your RTO requirements. Worth remembering.
It might also be useful for checking application behavior when replication lag occurs.