Skip to main content
Version: 3.8

A Guide on How to Backup and Restore Data in ScalarDL

Since ScalarDL uses ScalarDB that provides transaction capability on top of non-transactional (possibly transactional) databases non-invasively, you need to take special care of backing up and restoring the databases in a transactionally-consistent way. This guide shows you how to create and restore transactionally-consistent ScalarDL backups.

We first describe how to backup and restore the databases of ScalarDL Ledger. Then, we will describe how the process is extended to cover a case where Auditor is used.

Create Backups of Ledger Databases

For Transactional Databases

JDBC databases

You can take a backup with your favorite way for JDBC databases. One requirement for backup in ScalarDL on JDBC databases is that backups for all the ScalarDL managed tables (including the coordinator and scalardb tables) need to be transactionally-consistent or automatically recoverable to a transactionally-consistent state. That means that you need to create a consistent snapshot by dumping all tables in a single transaction. For example, you can use mysqldump command with --single-transaction option in MySQL and pg_dump command in PostgreSQL to achieve that. Or when you use Amazon RDS (Relational Database Service) or Azure Database for MySQL/PostgreSQL, you can restore to any point within the backup retention period with the automated backup feature, which satisfies the requirement.

For Non-transactional Databases

Basic strategy to create a transactionally-consistent backup

One way to create a transactionally-consistent backup is to take a backup while ScalarDL cluster does not have outstanding transactions. If an underlying database supports a point-in-time snapshot/backup mechanism, you can take a snapshot during the period. If an underlying database supports a point-in-time restore/recovery mechanism, you can set a restore point to a specific time (preferably the mid-time) in the period.

To easily achieve this, ScalarDL exposes pause API to make ScalarDL drain outstanding transactions and stop accepting new transactions. We also provide a simple client program called scalar-admin to make a pause request (and unpause request) to a ScalarDL cluster and obtain a paused duration.

Note that when you use a point-in-time-restore/recovery mechanism, it is recommended to minimize the clock drifts between nodes (ScalarDL nodes and a client node that requests a pause) by using clock synchronization such as NTP. Otherwise, the time you get as a paused duration might be too different from the time in which the pause was actually conducted, which could restore to a point where ongoing transactions exist. Also, it is recommended to pause a long enough time (e.g., 10 seconds) and use the mid-time of the paused duration since clock synchronization cannot perfectly synchronize clocks between nodes.

Database-specific ways to create a transactionally-consistent backup

Cassandra

Cassandra has a built-in replication mechanism, so you do not always have to create a transactionally-consistent backup. For example, if replication is set to 3 and only the data of one of the nodes in a cluster is lost, you do not need a transactionally-consistent backup because the node can be recovered with a normal (transactionally-inconsistent) snapshot and the repair mechanism. However, if the quorum of nodes of a cluster loses their data, we need a transactionally-consistent backup to restore the cluster to a certain transactionally-consistent point.

If you want to create a transactionally-consistent cluster-wide backup, please follow the basic strategy section, or stop the Cassandra cluster and take the copies of all the nodes of the cluster, and start the cluster.

To avoid mistakes, it is recommended to use Cassy. Cassy is also integrated with scalar-admin so it can issue a pause request to the application of a Cassandra cluster. Please see the doc for more details.

Cosmos DB

You must create a Cosmos DB account with a Continuous backup policy enabled to use point-in-time restore (PITR) feature. Backups are created continuously after it is enabled. To specify a transactionally-consistent restore point, please pause ScalarDL service as described in the basic strategy.

DynamoDB

You must enable the point-in-time recovery (PITR) feature for DynamoDB tables. If you use ScalarDL Schema Loader, it enables PITR by default. To specify a transactionally-consistent restore point, please pause ScalarDL service as described in the basic strategy.

Restore Backups of Ledger Databases

To restore backups, you must follow the Restore Backup section. You must stop ScalarDL Ledger services before restoring database backups and start the ScalarDL Ledger services after restoring the backups.

Create/Restore Backups of Auditor Databases

When you use Auditor, you also need to take backups of Auditor databases in addition to Ledger databases. To make the backups of Ledger and Auditor databases consistent, you always need to pause a Ledger cluster regardless of whether you use transactional databases or non-transactional databases for Ledger and Auditor.

Here is the steps to take backups:

  1. Pause a Ledger cluster
  2. Take backups of Ledger databases (as described above)
  3. Take backups of Auditor databases (as described above)
  4. Unpause the Ledger cluster

Note that, even if Ledger is paused, Auditor still accepts requests and updates its data (i.e., lock tables), however, the updated data is lazily recovered once Ledger is unpaused. To reduce the lazy recovery overhead, it is always a good practice to take backups while there are no requests for ScalarDL. We are planning to provide a more efficient scheme as future work.

When restoring backups, make sure you use the backups that are created in the same pause period. You must stop ScalarDL Ledger and Auditor services before restoring database backups and start the ScalarDL Ledger and Auditor services after restoring the backups.