Restoring Neo4j Containers

This approach assumes you have credentials and wish to store your backups on Google Cloud Storage, AWS S3, or Azure Blob Storage. If this is not the case, you will need to adjust the restore script for your desired cloud storage method, but the approach will work for any backup location.
This approach works only for Neo4j 4.0+. The tools and the DBMS itself changed quite a lot between 3.5 and 4.0, and the approach here will likely not work for older databases without substantial modification.

Approach

The restore container is used as an initContainer in the main cluster. Prior to a node in the Neo4j cluster starting, the restore container copies down the backup set, and restores it into place. When the initContainer terminates, the regular Neo4j docker instance starts, and picks up where the backup left off.

This container is primarily tested against the backup .tar.gz archives produced by the backup container in this same code repository. We recommend you use that approach. If you tar/gz your own backups using a different approach, be careful to inspect the restore.sh script, because it needs to make certain assumptions about directory structure that come out of archived backups in order to restore properly.

In order to read the backup files from cloud storage the container needs to provide some credentials or authentication. The mechanisms available depends on the cloud service provider.

Use a Service Account to access cloud storage (Google Cloud only)

GCP

Workload Identity is the recommended way to access Google Cloud services from applications running within GKE due to its improved security properties and manageability.

Follow the GCP instructions to:

  • Enable Workload Identity on your GKE cluster

  • Create a Google Cloud IAMServiceAccount that has read permissions for your backup location

  • Bind the IAMServiceAccount to the Neo4j deployment’s Kubernetes ServiceAccount*

[*] you can configure the name of the Kubernetes ServiceAccount that a Neo4j deployment uses by setting serviceAccountName in values.yaml. To check the name of the Kubernetes ServiceAccount that a Neo4j deployment is using run kubectl get pods -o=jsonpath='{.spec.serviceAccountName}{"\n"}' <your neo4j pod name>

If you are unable to use Workload Identity with GKE then you can create a service key secret instead as described in the next section.

Create a service key secret to access cloud storage

First you want to create a kubernetes secret that contains the content of your account service key. This key must have permissions to access the bucket and backup set that you’re trying to restore.

AWS

  • You must create the credential file and this file should look like this:

[default]
region=
aws_access_key_id=
aws_secret_access_key=
  • You have to create a secret for this file

kubectl create secret generic neo4j-aws-credentials \
    --from-file=credentials=aws-credentials

GCP

You do NOT need to follow the steps in this section if you are using Workload Identity for GCP.

  • You must create the credential file and this file should look like this:

{
  "type": "",
  "project_id": "",
  "private_key_id": "",
  "private_key": "",
  "client_email": "",
  "client_id": "",
  "auth_uri": "",
  "token_uri": "",
  "auth_provider_x509_cert_url": "",
  "client_x509_cert_url": ""
}
  • You have to create a secret for this file

kubectl create secret generic neo4j-gcp-credentials \
    --from-file=credentials=gcp-credentials.json

Azure

  • You must create the credential file and this file should look like this:

export ACCOUNT_NAME=<NAME_STORAGE_ACCOUNT>
export ACCOUNT_KEY=<STORAGE_ACCOUNT_KEY>
  • You have to create a secret for this file

kubectl create secret generic neo4j-azure-credentials \
    --from-file=credentials=azure-credentials.sh

If this service key secret is not in place, the auth information will not be able to be mounted as a volume in the initContainer, and your pods may get stuck/hung at ContainerCreating phase.

Configure the initContainer for Core and Read Replica Nodes

Refer to the single instance restore deploy scenario to see how the initContainers are configured.

What you will need to customize and ensure: * Ensure you have created the appropriate secret and set its name * Ensure that the volume mount to /auth matches the secret name you created above. * Ensure that your BUCKET, and credentials are set correctly given the way you created your secret.

The example scenario above creates the initContainer just for core nodes. It’s strongly recommended you do the same for readReplica.initContainers if you are using read replicas. If you restore only to core nodes and not to read replicas, when they start the core nodes will replicate the data to the read replicas. This will work just fine, but may result in longer startup times and much more bandwidth.

Restore Environment Variables for the Init Container

  • To restore you need to add the necessary parameters to values.yaml and this file should look like this:

...
core:
  ...
  restore:
    enabled: true
    secretName: (neo4j-gcp-credentials|neo4j-aws-credentials|neo4j-azure-credentials|NULL) #required. Set NULL if using Workload Identity in GKE.
    database: neo4j,system #required
    cloudProvider: (gcp|aws|azure) #required
    bucket: (gs://|s3://)test-neo4j #required
    timestamp: "latest" #optional #default:"latest"
    forceOverwrite: true #optional #default:true
    purgeOnComplete: true #optinal #default:true
readReplica:
  ...
  restore:
    enabled: true
    secretName: (neo4j-gcp-credentials|neo4j-aws-credentials|neo4j-azure-credentials|NULL) #required. Set NULL if using Workload Identity in GKE.
    database: neo4j,system #required
    cloudProvider: (gcp|aws|azure) #required
    bucket: (gs://|s3://)test-neo4j #required
    timestamp: "2020-06-16-12:32:57" #optional #default:"latest"
    forceOverwrite: true #optional #default:true
    purgeOnComplete: true #optinal #default:true
...
  • standard neo4j installation run from the root of neo4j-helm - not tools/restore. Further, it could be the same command you ran to create the cores/replicas originally.

helm install \
    neo4j neo4j/neo4j \
    -f values.yaml \
    --set acceptLicenseAgreement=yes

Warnings & Indications

A common way you might deploy Neo4j would be restore from last backup when a container initializes. This would be good for a cluster, because it would minimize how much catch-up is needed when a node is launched. Any difference between the last backup and the rest of the cluster would be provided via catch-up.

For single nodes, take extreme care here.

If a node crashes, and you automatically restore from backup, and force-overwrite what was previously on the disk, you will lose any data that the database captured between when the last backup was taken, and when the crash happened. As a result, for single node instances of Neo4j you should either perform restores manually when you need them, or you should keep a very regular backup schedule to minimize this data loss. If data loss is under no circumstances acceptable, do not automate restores for single node deploys.

Special notes for Azure Storage. Parameters require a "bucket", but for Azure storage the naming is slightly different. There is no protocol scheme as there would be for AWS or Google. The bucket specified is the "blob container name", not the account name. where the files were be placed by the backup, and is the same "blob container name" you used in the Backup Chapter, assuming you followed the examples there. Relative paths will be respected; if you set bucket to be container/path/to/directory, this expects your backup files to be stored in container at the path /path/to/directory/db/db-TIMESTAMP.tar.gz where "db" is the name of the database being backed up (i.e. neo4j and system).

Running the Restore

With the initContainer in place and properly configured, simply deploy a new cluster using the regular approach. Prior to start, the restore will happen, and when the cluster comes live, it will be populated with the data.

Limitations

  • If you want usernames, passwords, and permissions to be restored, you must include a restore of the system graph.

  • Container has not yet been tested with incremental backups