Back up, aggregate, and restore (online)

For performing backups, Neo4j uses the Admin Service, which is only available inside the Kubernetes cluster and access to it should be guarded. For more information, see Accessing Neo4j.

Prepare to back up a database(s) to a cloud provider (AWS, GCP, and Azure) bucket

You can perform a backup of a Neo4j database(s) to any cloud provider (AWS, GCP, and Azure) bucket using the neo4j/neo4j-admin Helm chart. From Neo4j 5.10, the neo4j/neo4j-admin Helm chart also supports performing a backup of multiple databases. From 5.13, the neo4j/neo4j-admin Helm chart also supports workload identity integration for GCP, AWS, and Azure. From 5.14, the neo4j/neo4j-admin Helm chart also supports MinIO (an AWS S3-compatible object storage API) for Non-TLS/SSL endpoints.

Prerequisites

Before you can back up a database and upload it to your bucket, verify that you have the following:

Create a Kubernetes secret

You can create a Kubernetes secret with the credentials that can access the cloud provider bucket using one of the following options:

Create the secret named gcpcreds using your GCP service account JSON key file. The JSON key file contains all the details of the service account that has access to the bucket.

kubectl create secret generic gcpcreds --from-file=credentials=/path/to/gcpcreds.json
  1. Create a credentials file in the following format:

    [ default ]
    region = us-east-1
    aws_access_key_id = <your-aws_access_key_id>
    aws_secret_access_key = <your-aws_secret_access_key>
  2. Create the secret named awscreds via the credentials file:

    kubectl create secret generic awscreds --from-file=credentials=/path/to/your/credentials
  1. Create a credentials file in the following format:

    AZURE_STORAGE_ACCOUNT_NAME=<your-azure-storage-account-name>
    AZURE_STORAGE_ACCOUNT_KEY=<your-azure-storage-account-key>
  2. Create the secret named azurecred via the credentials file:

    kubectl create secret generic azurecred --from-file=credentials=/path/to/your/credentials

Configure the backup parameters

You can configure the backup parameters in the backup-values.yaml file either by using the secretName and secretKeyName parameters or by mapping the Kubernetes service account to the workload identity integration.

The following examples show the minimum configuration required to perform a backup to a cloud provider bucket. For more information about the available backup parameters, see Backup parameters.

Configure the backup-values.yaml file using the secretName and secretKeyName parameters

neo4j:
  image: "neo4j/helm-charts-backup"
  imageTag: "5.10.0"
  jobSchedule: "* * * * *"
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1
  backoffLimit: 3

backup:
  bucketName: "my-bucket"
  databaseAdminServiceName:  "standalone-admin" #This is the Neo4j Admin Service name.
  database: "neo4j,system"
  cloudProvider: "gcp"
  secretName: "gcpcreds"
  secretKeyName: "credentials"

consistencyCheck:
  enabled: true
neo4j:
  image: "neo4j/helm-charts-backup"
  imageTag: "5.10.0"
  jobSchedule: "* * * * *"
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1
  backoffLimit: 3

backup:
  bucketName: "my-bucket"
  databaseAdminServiceName:  "standalone-admin"
  database: "neo4j,system"
  cloudProvider: "aws"
  secretName: "awscreds"
  secretKeyName: "credentials"

consistencyCheck:
  enabled: true
neo4j:
  image: "neo4j/helm-charts-backup"
  imageTag: "5.10.0"
  jobSchedule: "* * * * *"
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1
  backoffLimit: 3

backup:
  bucketName: "my-bucket"
  databaseAdminServiceName:  "standalone-admin"
  database: "neo4j,system"
  cloudProvider: "azure"
  secretName: "azurecreds"
  secretKeyName: "credentials"

consistencyCheck:
  enabled: true

Configure the backup-values.yaml file using service account workload identity integration

In certain situations, it may be useful to assign a Kubernetes Service Account with workload identity integration to the Neo4j backup pod. This is particularly relevant when you want to improve security and have more precise access control for the pod. Doing so ensures that secure access to resources is granted based on the pod’s identity within the cloud ecosystem. For more information on setting up a service account with workload identity, see Google Kubernetes Engine (GKE) → Use Workload Identity, Amazon EKS → Configuring a Kubernetes service account to assume an IAM role, and Microsoft Azure → Use Microsoft Entra Workload ID with Azure Kubernetes Service (AKS).

To configure the Neo4j backup pod to use a Kubernetes service account with workload identity, set serviceAccountName to the name of the service account to use. For Azure deployments, you also need to set the azureStorageAccountName parameter to the name of the Azure storage account, where the backup files will be uploaded. For example:

neo4j:
  image: "neo4j/helm-charts-backup"
  imageTag: "5.13.0"
  jobSchedule: "* * * * *"
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1
  backoffLimit: 3

backup:
  bucketName: "my-bucket"
  databaseAdminServiceName:  "standalone-admin" #This is the Neo4j Admin Service name.
  database: "neo4j,system"
  cloudProvider: "gcp"
  secretName: ""
  secretKeyName: ""

consistencyCheck:
  enabled: true

serviceAccountName: "demo-service-account"
neo4j:
  image: "neo4j/helm-charts-backup"
  imageTag: "5.13.0"
  jobSchedule: "* * * * *"
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1
  backoffLimit: 3

backup:
  bucketName: "my-bucket"
  databaseAdminServiceName:  "standalone-admin"
  database: "neo4j,system"
  cloudProvider: "aws"
  secretName: ""
  secretKeyName: ""

consistencyCheck:
  enabled: true

serviceAccountName: "demo-service-account"
neo4j:
  image: "neo4j/helm-charts-backup"
  imageTag: "5.13.0"
  jobSchedule: "* * * * *"
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1
  backoffLimit: 3

backup:
  bucketName: "my-bucket"
  databaseAdminServiceName:  "standalone-admin"
  database: "neo4j,system"
  cloudProvider: "azure"
  azureStorageAccountName: "storageAccountName"

consistencyCheck:
  enabled: true

serviceAccountName: "demo-service-account"

The /backups mount created by default is an emptyDir type volume. This means that the data stored in this volume is not persistent and will be lost when the pod is deleted. To use a persistent volume for backups add the following section to the backup-values.yaml file:

tempVolume:
  persistentVolumeClaim:
    claimName: backup-pvc

You need to create the persistent volume and persistent volume claim before installing the neo4j-admin Helm chart. For more information, see Volume mounts and persistent volumes.

Configure the backup-values.yaml file for using MinIO

This feature is available from Neo4j 5.14.

MinIO is an AWS S3-compatible object storage API. You can specify the minioEndpoint parameter in the backup-values.yaml file to push your backups to your MinIO bucket. This endpoint must be a s3 API endpoint or else the backup Helm chart will fail. Only non-TLS/SSL endpoints are supported. For example:

neo4j:
  image: "neo4j/helm-charts-backup"
  imageTag: "5.14.0"
  jobSchedule: "* * * * *"
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1
  backoffLimit: 3

backup:
  bucketName: "my-bucket"
  databaseAdminServiceName:  "standalone-admin"
  minioEndpoint: "http://demo.minio.svc.cluster.local:9000"
  database: "neo4j,system"
  cloudProvider: "aws"
  secretName: "awscreds"
  secretKeyName: "credentials"

consistencyCheck:
  enabled: true

Prepare to back up a database(s) to on-premises storage

This feature is available from Neo4j 5.16.

You can perform a backup of a Neo4j database(s) to on-premises storage using the neo4j/neo4j-admin Helm chart. When configuring the backup-values.yaml file, keep the “cloudProvider” field empty and provide a persistent volume in the tempVolume section to ensure the backup files are persistent if the pod is deleted.

You need to create the persistent volume and persistent volume claim before installing the neo4j-admin Helm chart. For more information, see Volume mounts and persistent volumes.

For example:

neo4j:
  image: "neo4j/helm-charts-backup"
  imageTag: "5.16.0"
  jobSchedule: "* * * * *"
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1
  backoffLimit: 3

backup:
  bucketName: "my-bucket"
  databaseAdminServiceName:  "standalone-admin"
  database: "neo4j,system"
  cloudProvider: ""

consistencyCheck:
  enabled: true

tempVolume:
  persistentVolumeClaim:
    claimName: backup-pvc

Backup parameters

To see what options are configurable on the Helm chart use helm show values and the Helm chart neo4j/neo4j-admin.
From Neo4j 5.10, the neo4j/neo4j-admin Helm chart also supports assigning your Neo4j pods to specific nodes using nodeSelector labels, and from Neo4j 5.11, using affinity/anti-affinity rules or tolerations. For more information, see Assigning backup pods to specific nodes and the Kubernetes official documentation on Affinity and anti-affinity rules and Taints and Tolerations.

For example:

helm show values neo4j/neo4j-admin
## @param nameOverride String to partially override common.names.fullname
nameOverride: ""
## @param fullnameOverride String to fully override common.names.fullname
fullnameOverride: ""
# disableLookups will disable all the lookups done in the helm charts
# This should be set to true when using ArgoCD since ArgoCD uses helm template and the helm lookups will fail
# You can enable this when executing helm commands with --dry-run command
disableLookups: false

neo4j:
  image: "neo4j/helm-charts-backup"
  imageTag: "5.21.0"
  podLabels: {}
#    app: "demo"
#    acac: "dcdddc"
  podAnnotations: {}
#    ssdvvs: "svvvsvs"
#    vfsvswef: "vcfvgb"
  # define the backup job schedule . default is * * * * *
  jobSchedule: ""
  # default is 3
  successfulJobsHistoryLimit:
  # default is 1
  failedJobsHistoryLimit:
  # default is 3
  backoffLimit:
  #add labels if required
  labels: {}

backup:
  # Ensure the bucket is already existing in the respective cloud provider
  # In case of azure the bucket is the container name in the storage account
  # bucket: azure-storage-container
  bucketName: ""

  #address details of the neo4j instance from which backup is to be done (serviceName or ip either one is required)

  #ex: standalone-admin.default.svc.cluster.local:6362
  # admin service name -  standalone-admin
  # namespace - default
  # cluster domain - cluster.local
  # port - 6362

  #ex: 10.3.3.2:6362
  # admin service ip - 10.3.3.2
  # port - 6362

  databaseAdminServiceName: ""
  databaseAdminServiceIP: ""
  #default name is 'default'
  databaseNamespace: ""
  #default port is 6362
  databaseBackupPort: ""
  #default value is cluster.local
  databaseClusterDomain: ""
  # specify minio endpoint ex: http://demo.minio.svc.cluster.local:9000
  # please ensure this endpoint is the s3 api endpoint or else the backup helm chart will fail
  # as of now it works only with non tls endpoints
  # to be used only when aws is used as cloudProvider
  minioEndpoint: ""

  #name of the database to backup ex: neo4j or neo4j,system (You can provide command separated database names)
  # In case of comma separated databases failure of any single database will lead to failure of complete operation
  database: ""
  # cloudProvider can be either gcp, aws, or azure
  # if cloudProvider is empty then the backup will be done to the /backups mount.
  # the /backups mount can point to a persistentVolume based on the definition set in tempVolume
  cloudProvider: ""



  # name of the kubernetes secret containing the respective cloud provider credentials
  # Ensure you have read,write access to the mentioned bucket
  # For AWS :
  # add the below in a file and create a secret via
  # 'kubectl create secret generic awscred --from-file=credentials=/demo/awscredentials'

  #  [ default ]
  #  region = us-east-1
  #  aws_access_key_id = XXXXX
  #  aws_secret_access_key = XXXX

  # For AZURE :
  # add the storage account name and key in below format in a file create a secret via
  # 'kubectl create secret generic azurecred --from-file=credentials=/demo/azurecredentials'

  #  AZURE_STORAGE_ACCOUNT_NAME=XXXX
  #  AZURE_STORAGE_ACCOUNT_KEY=XXXX

  # For GCP :
  # create the secret via the gcp service account json key file.
  # ex: 'kubectl create secret generic gcpcred --from-file=credentials=/demo/gcpcreds.json'
  secretName: ""
  # provide the keyname used in the above secret
  secretKeyName: ""
  # provide the azure storage account name
  # this to be provided when you are using workload identity integration for azure
  azureStorageAccountName: ""
  #setting this to true will not delete the backup files generated at the /backup mount
  keepBackupFiles: true

  #Below are all neo4j-admin database backup flags / options
  #To know more about the flags read here : https://neo4j.com/docs/operations-manual/current/backup-restore/online-backup/
  pageCache: ""
  includeMetadata: "all"
  type: "AUTO"
  keepFailed: false
  parallelRecovery: false
  verbose: true
  heapSize: ""

  # https://neo4j.com/docs/operations-manual/current/backup-restore/aggregate/
  # Performs aggregate backup. If enabled, NORMAL BACKUP WILL NOT BE DONE only aggregate backup
  # fromPath supports only s3 or local mount. For s3 , please set cloudProvider to aws and use either serviceAccount or creds
  aggregate:
    enabled: false
    verbose: true
    keepOldBackup: false
    parallelRecovery: false
    # Only AWS S3 or local mount paths are supported
    # For S3 provide the complete path , Ex: s3://bucket1/bucket2
    fromPath: ""
    # database name to aggregate. Can contain * and ? for globbing.
    database: ""

#Below are all neo4j-admin database check flags / options
#To know more about the flags read here : https://neo4j.com/docs/operations-manual/current/tools/neo4j-admin/consistency-checker/
consistencyCheck:
  enable: false
  checkIndexes: true
  checkGraph: true
  checkCounts: true
  checkPropertyOwners: true
  #The database name for which consistency check needs to be done.
  #Defaults to the backup.database values if left empty
  #The database name here should match with one of the database names present in backup.database. If not , the consistency check will be ignored
  database: ""
  maxOffHeapMemory: ""
  threads: ""
  verbose: true

# Set to name of an existing Service Account to use if desired
# Follow the following links for setting up a service account with workload identity
# Azure - https://learn.microsoft.com/en-us/azure/aks/workload-identity-overview?tabs=go
# GCP - https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity
# AWS - https://docs.aws.amazon.com/eks/latest/userguide/associate-service-account-role.html
serviceAccountName: ""

# Volume to use as temporary storage for files before they are uploaded to cloud. For large databases local storage may not have sufficient space.
# In that case set an ephemeral or persistent volume with sufficient space here
# The chart defaults to an emptyDir, use this to overwrite default behavior
#tempVolume:
#  persistentVolumeClaim:
#    claimName: backup-pvc

# securityContext defines privilege and access control settings for a Pod. Making sure that we don't run Neo4j as root user.
securityContext:
  runAsNonRoot: true
  runAsUser: 7474
  runAsGroup: 7474
  fsGroup: 7474
  fsGroupChangePolicy: "Always"

# default ephemeral storage of backup container
resources:
  requests:
    ephemeralStorage: "4Gi"
    cpu: ""
    memory: ""
  limits:
    ephemeralStorage: "5Gi"
    cpu: ""
    memory: ""

# nodeSelector labels
# please ensure the respective labels are present on one of nodes or else helm charts will throw an error
nodeSelector: {}
#  label1: "true"
#  label2: "value1"

# set backup pod affinity
affinity: {}
#  podAffinity:
#    requiredDuringSchedulingIgnoredDuringExecution:
#      - labelSelector:
#          matchExpressions:
#            - key: security
#              operator: In
#              values:
#                - S1
#        topologyKey: topology.kubernetes.io/zone
#  podAntiAffinity:
#    preferredDuringSchedulingIgnoredDuringExecution:
#      - weight: 100
#        podAffinityTerm:
#          labelSelector:
#            matchExpressions:
#              - key: security
#                operator: In
#                values:
#                  - S2
#          topologyKey: topology.kubernetes.io/zone

#Add tolerations to the Neo4j pod
tolerations: []
#  - key: "key1"
#    operator: "Equal"
#    value: "value1"
#    effect: "NoSchedule"
#  - key: "key2"
#    operator: "Equal"
#    value: "value2"
#    effect: "NoSchedule"

Back up your database(s)

To back up your database(s), you install the neo4j-admin Helm chart using the configured backup-values.yaml file.

  1. Install neo4j-admin Helm chart using the backup-values.yaml file:

    helm install backup-name neo4j-admin -f /path/to/your/backup-values.yaml

    The neo4j/neo4j-admin Helm chart installs a cronjob that launches a pod based on the job schedule. This pod performs a backup of one or multiple databases, a consistency check of the backup file(s), and uploads them to the cloud provider bucket.

  2. Monitor the backup pod logs using kubectl logs pod/<neo4j-backup-pod-name> to check the progress of the backup.

  3. Check that the backup files and the consistency check reports have been uploaded to the cloud provider bucket or on-premises storage.

Aggregate a database backup chain

The aggregate backup command turns a backup chain into a single backup file. This is useful when you have a backup chain that you want to restore to a different cluster, or when you want to archive a backup chain. For more information on the benefits of the aggregate backup chain operation, its syntax and available options, see Aggregate a database backup chain.

The neo4j-admin Helm chart supports aggregating a backup chain stored in an AWS S3 bucket or a local mount. If enabled, normal backup will not be done, only aggregate backup.

  1. To aggregate a backup chain stored in an AWS S3 bucket or a local mount, you need to provide the following information in your backup-values.yaml file:

    If your backup chain is stored on AWS S3, you need to set cloudProvider to aws and use either creds or serviceAccount to connect to your AWS S3 bucket. For example:

    Connect to your AWS S3 bucket using the awscreds secret
    neo4j:
      image: "neo4j/helm-charts-backup"
      imageTag: "5.21.0"
      jobSchedule: "* * * * *"
      successfulJobsHistoryLimit: 3
      failedJobsHistoryLimit: 1
      backoffLimit: 3
    
    backup:
    
      cloudProvider: "aws"
      secretName: "awscreds"
      secretKeyName: "credentials"
    
      aggregate:
        enabled: true
        verbose: false
        keepOldBackup: false
        parallelRecovery: false
        fromPath: "s3://bucket1/bucket2"
        # Database name to aggregate. Can contain * and ? for globbing.
        database: "neo4j"
    
    resources:
      requests:
        ephemeralStorage: "4Gi"
      limits:
        ephemeralStorage: "5Gi"
    Connect to your AWS S3 bucket using serviceAccount
    neo4j:
      image: "neo4j/helm-charts-backup"
      imageTag: "5.21.0"
      jobSchedule: "* * * * *"
      successfulJobsHistoryLimit: 3
      failedJobsHistoryLimit: 1
      backoffLimit: 3
    
    backup:
    
        cloudProvider: "aws"
    
        aggregate:
          enabled: true
          verbose: false
          keepOldBackup: false
          parallelRecovery: false
          fromPath: "s3://bucket1/bucket2"
          # Database name to aggregate. Can contain * and ? for globbing.
          database: "neo4j"
    
    #The service account must already exist in your cloud provider account and have the necessary permissions to manage your S3 bucket, as well as to download and upload files. See the example policy below.
    #{
    #   "Version": "2012-10-17",
    #    "Id": "Neo4jBackupAggregatePolicy",
    #    "Statement": [
    #        {
    #            "Sid": "Neo4jBackupAggregateStatement",
    #            "Effect": "Allow",
    #            "Action": [
    #                "s3:ListBucket",
    #                "s3:GetObject",
    #                "s3:PutObject",
    #                "s3:DeleteObject"
    #            ],
    #            "Resource": [
    #                "arn:aws:s3:::mybucket/*",
    #                "arn:aws:s3:::mybucket"
    #            ]
    #        }
    #    ]
    #}
    serviceAccountName: "my-service-account"
    
    resources:
      requests:
        ephemeralStorage: "4Gi"
      limits:
        ephemeralStorage: "5Gi"
    neo4j:
      image: "neo4j/helm-charts-backup"
      imageTag: "5.21.0"
      successfulJobsHistoryLimit: 1
      failedJobsHistoryLimit: 1
      backoffLimit: 1
    
    backup:
    
      aggregate:
        enabled: true
        verbose: false
        keepOldBackup: false
        parallelRecovery: false
        fromPath: "/backups"
        # Database name to aggregate. Can contain * and ? for globbing.
        database: "neo4j"
    
    tempVolume:
      persistentVolumeClaim:
        claimName: aggregate-pv-pvc
    
    resources:
      requests:
        ephemeralStorage: "4Gi"
      limits:
        ephemeralStorage: "5Gi"
  2. Install the neo4j-admin Helm chart using the configured backup-values.yaml file:

    helm install backup-name neo4j-admin -f /path/to/your/backup-values.yaml
  3. Monitor the pod logs using kubectl logs pod/<neo4j-aggregate-backup-pod-name> to check the progress of the aggregate backup operation.

  4. Verify that the aggregated backup file has replaced your backup chain in the cloud provider bucket or on-premises storage.

Restore a single database

To restore a single offline database or a database backup, you first need to delete the database that you want to replace unless you want to restore the backup as an additional database in your DBMS. Then, use the restore command of neo4j-admin to restore the database backup. Finally, use the Cypher command CREATE DATABASE name to create the restored database in the system database.

Delete the database that you want to replace

Before you restore the database backup, you have to delete the database that you want to replace with that backup using the Cypher command DROP DATABASE name against the system database. If you want to restore the backup as an additional database in your DBMS, then you can proceed to the next section.

For Neo4j cluster deployments, you run the Cypher command DROP DATABASE name only on one of the cluster servers. The command is automatically routed from there to the other cluster members.

  1. Connect to the Neo4j DBMS:

    kubectl exec -it <release-name>-0 -- bash
  2. Connect to the system database using cypher-shell:

    cypher-shell -u neo4j -p <password> -d system
  3. Drop the database you want to replace with the backup:

    DROP DATABASE neo4j;
  4. Exit the Cypher Shell command-line console:

    :exit;

Restore the database backup

You use the neo4j-admin database restore command to restore the database backup, and then the Cypher command CREATE DATABASE name to create the restored database in the system database. For information about the command syntax, options, and usage, see Restore a database backup.

For Neo4j cluster deployments, restore the database backup on each cluster server.

  1. Run the neo4j-admin database restore command to restore the database backup:

    neo4j-admin database restore neo4j --from-path=/backups/neo4j --expand-commands
  2. Connect to the system database using cypher-shell:

    cypher-shell -u neo4j -p <password> -d system
  3. Create the neo4j database.

    For Neo4j cluster deployments, you run the Cypher command CREATE DATABASE name only on one of the cluster servers.

    CREATE DATABASE neo4j;
  4. Open the browser at http://<external-ip>:7474/browser/ and check that all data has been successfully restored.

  5. Execute a Cypher command against the neo4j database, for example:

    MATCH (n) RETURN n

    If you have backed up your database with the option --include-metadata, you can manually restore the users and roles metadata. For more information, see Restore a database backup → Example.

To restore the system database, follow the steps described in Dump and load databases (offline).