Postgresql pg_basebackup to Repair Replication Streaming

This article is aimed to help someone that needs to run pg_basebackup. The Postgresql documentation is excellent, and hopefully this will serve as an example of successful usage.

In this article, we will walk through a very real scenario in which renaming some PostgreSQL Databases on the primary of a primary, replica database server pair caused replication streaming to get “Ugly.”

Problem

I was responsible for several well-spec’d Linux database machines serving hundreds of thousands of users in a production environment.
Linux OS: RHEL 7 on bare metal
Postgresql version: 10.4
I was restoring 3 databases from a separate primary, second server pair running RHEL 6 and Postgresql version 9.3. The pg_restore attempts were all successful, then I decided I wanted to rename the databases. After running “ALTER DATABASE db_name RENAME TO new_db_name;” from psql terminal, the replica server acted very strangely. One of the three databases disappeared on the replica. The other two did not update their names to match the primary.

Solution

Thankfully this all happened during planned downtime. The reason pg_basebackup gets called “The Ugly Option,” is because it requires downtime. This pg_baseback utility is fantastic though. It can fix the entire replica from scratch in very little time. The total size of all the databases on the primary server was more than 1 terabyte. The total time it took to run the pg_basebackup was less than 30 minutes.

I ran the pg_basebackup utility from the replica, to reach out to the primary and pull in the necessary data. Bear in mind, this utility re-initializes the entire database cluster. This means all of the databases in your $PGDATA directory. Generally, a standard Postgresql installation only has one of these, even though it’s possible to have more than one on a single server.

Stop Postgresql on the Secondary / Replica

-- stop postgres on slave omdb2
$ systemctl stop postgresql-10.service
-- verify dead
$ systemctl status -l postgresql-10.service

Rename the current Postgresql data directory

-- confirm PGDATA (in my case it was "/data/pgsql/data"
-- find yours with "find / -name postgresql.conf"
-- the directory containing postgresql.conf should be your $PGDATA
$ echo $PGDATA
-- on slave, rename $PGDATA directory
-- this clears the way for a new data directory without trashing the prior one
-- use your $PGDATA directory path (not mine (below))
$ mv /data/pgsql/data /data/pgsql/data-backup
-- create the fresh new data directory for $PGDATA & chmod & chown
-- use your $PGDATA directory path (not mine (below))
$ mkdir -p /data/pgsql/data
$ chown -R postgres:postgres /data/pgsql/data
$ chmod 700 /data/pgsql/data

Initiate pg_basebackup from Secondary / Replica

-- start pg_basebackup from slave$ /usr/pgsql-10/bin/pg_basebackup -h <host-of-master> -D $PGDATA -R -P -v -U replica-- this will prompt you for the replica password, if you don't know it, it is likely stored in $PGDATA/recovery.conf, assuming you had streaming replication working sometime in the past

Confirm Ownership of $PGDATA

After this, you may need to recursively chown $PGDATA.

-- make sure the ownership is good
-- use your $PGDATA directory path (not mine (below))
$ chown -R postgres:postgres /data/pgsql/data

Start up the Secondary / Replica

-- start pg
$ systemctl start postgresql-10.service
-- check it
$ systemctl status -l postgresql-10.service

Final Thoughts

You might be thinking, “But this is SOO BORRRING,” and that may be true, but I hope it helps someone in a bind.
We used pg_basebackup to restore the entire Postgresql database cluster on the Secondary server of a Primary, Secondary streaming replica pair.
The total size of the databases on the Primary was > 1 terabyte. The time it took to execute the pg_basebackup utility was > 30 minutes. Your mileage may vary. I hope you don’t have to go into an unplanned outage to fix something like this, but if you’re already offline, this solution works brilliantly.

BR,
Timothy D Beach

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store