Backup and restore planning
There are two main reasons for backing up your Neo4j databases and storing them in a safe, off-site location:
-
to be able to quickly recover your data in case of failure, for example related to hardware, human error, or natural disaster.
-
to be able to perform routine administrative operations, such as moving a database from one instance to another, upgrading, or reclaiming space.
Backup and restore strategy
Depending on your particular deployment and environment, it is important to design an appropriate backup and restore strategy.
There are various factors to consider when deciding on your strategy, such as:
-
Type of environment – development, test, or production.
-
Data volumes.
-
Number of databases.
-
Available system resources.
-
Downtime tolerance during backup and restore.
-
Demands on Neo4j performance during backup and restore. This factor might lead your decision towards performing these operations during an off-peak period.
-
Tolerance for data loss in case of failure.
-
Tolerance for downtime in case of failure. If you have zero tolerance for downtime and data loss, you might want to consider performing an online or even a scheduled backup.
-
Frequency of updates to the database.
-
Type of backup and restore method (online or offline), which may depend on whether you want to:
-
perform full backups (online or offline).
-
perform differential backups (online only).
-
use SSL/TLS for the backup network communication (online only).
-
keep your databases as archive files (online or offline).
-
-
How many backups you want to keep.
-
Where the backups will be stored — drive or remote server, cloud storage, different data center, different location, etc.
It is recommended to store your database backups on a separate off-site server (drive or remote) from the database files. This ensures that if for some reason your Neo4j DBMS crashes, you will be able to access the backups and perform a restore.
-
How you will test recovery routines, and how often.
Backup and restore options
Neo4j supports backing up and restoring both online and offline databases.
It uses Neo4j Admin tool commands, which can be run from a live, as well as from an offline Neo4j DBMS.
All neo4j-admin
commands must be invoked as the neo4j
user to ensure the appropriate file permissions.
-
neo4j-admin database backup/restore
(Enterprise only) -– used for performing online backup (full and differential) and restore operations. The database to be backed up must be in online mode. The command produces an immutable artifact, which has an inspectable API to aid management and operability. This command is suitable for production environments, where you cannot afford downtime.The command can also be invoked over the network if access is enabled using
server.backup.listen_address
.Make sure to limit access to the backup server port to fully trusted, specific devices. Firewall policies should be considered. For more information, refer to the Server configurations section.
When using
neo4j-admin database backup
in a cluster, it is recommended to back up from an external instance as opposed to reuse instances that form part of the cluster. -
neo4j-admin database dump/load
–- used for performing offline dump and load operations. The database to be dumped must be in offline mode. The dump command can only be invoked from the server command line and is suitable for environments where downtime is not a factor. The command produces an archive file that follows the format <databasename><timestamp>.dump. -
neo4j-admin database copy
–- used for copying an offline database or backup. This command can be used for cleaning up database inconsistencies and reclaiming unused space.
File system copy-and-paste of databases is not supported and may result in unwanted behavior, such as corrupt stores. |
Considerations for backing up and restoring databases in a cluster
Backing up a database in a clustered environment is not essentially different from a standalone backup, apart from the fact that you must know which server in a cluster to connect to.
Use SHOW DATABASE <database>
to learn which servers are hosting the database you want to back up.
See Listing a single database for more information.
However, restoring a database in a cluster is different since it is not known in advance how a database is going to be allocated to the servers in a cluster. This method relies on the seed already existing on one of the servers. The recommended way to restore a database in a cluster is to seed from URI.
The Neo4j Admin commands |
Capability/ Usage | backup/restore |
dump/load |
copy |
---|---|---|---|
Neo4j Edition |
Enterprise |
all |
Enterprise |
Run from an online Neo4j DBMS |
|||
Run from an offline Neo4j DBMS |
|||
Run against a user database |
|||
Run against the |
|||
Run against a Composite databases |
|||
Perform full backups |
n/a |
||
Perform differential backups |
n/a |
||
Applied to an online database |
|||
Applied to an offline database |
only |
||
Can be run remotely |
only |
||
Command input |
database/archive (.backup) |
database/archive (.dump) |
database |
Command output |
archive (.backup)/database |
archive (.dump)/database |
database; no schema store |
Clean up database inconsistencies |
|||
Compact data store |
Databases to backup
A Neo4j DBMS can host multiple databases.
Both Neo4j Community and Enterprise Editions have a default user database, called neo4j
, and a system
database, which contains configurations, e.g., operational states of databases, security configuration, schema definitions, login credentials, and roles.
In the Enterprise Edition, you can also create additional user databases.
Each of these databases is backed up independently of one another.
It is very important to store a recent backup of your databases, including the |
Additional files to back up
The following files must be backed up separately from the databases:
-
The neo4j.conf file. If you have a cluster deployment, you should back up the configuration file for each cluster member.
-
All the files used for encryption, i.e., private key, public certificate, and the contents of the trusted and revoked directories. The locations of these are described in SSL framework. If you have a cluster, you should back up these files for each cluster member.
-
If using custom plugins, make sure that you have the plugins in a safe location.
-
If using Bloom or GDS Enterprise, back up license key files for these products as well.
Storage considerations
For any backup, it is important that you store your data separately from the production system, where there are no common dependencies, and preferably off-site. If you are running Neo4j in the cloud, you may use a different availability zone or even a separate cloud provider. Since backups are kept for a long time, the longevity of archival storage should be considered as part of backup planning.