Consistency checker

You can use the neo4j-admin database check command to check the consistency of a database, a dump, or a backup. The neo4j-admin tool is located in the /bin directory.

Syntax

The neo4j-admin database check command has the following syntax:

neo4j-admin database check [-h] [--expand-commands] [--force] [--verbose]
                           [--check-counts[=true|false]] [--check-graph[=true|false]]
                           [--check-indexes[=true|false]] [--check-property-owners[=true|false]]
                           [--additional-config=<file>] [--max-off-heap-memory=<size>]
                           [--report-path=<path>] [--threads=<number of threads>]
                           [[--from-path-data=<path> --from-path-txn=<path>] | [--from-path=<path> [--temp-path=<path>]]]
                           <database>

Description

This command allows for checking the consistency of a database, a dump, or a backup. It cannot be used with a database that is currently in use.

Some checks can be quite expensive, so it may be useful to turn some of them off for very large databases. Increasing the heap size might be a good idea.

It is not recommended to use an NFS to check the consistency of a database, a dump, or a backup as this slows the process down significantly.

Parameters

Table 1. neo4j-admin database check parameters
Parameter Description

<database>

Name of the database to check.

Options

The neo4j-admin database check command has the following options:

Table 2. neo4j-admin database check options
Option Description Default

--verbose

Enable verbose output.

-h, --help

Show this help message and exit.

--expand-commands

Allow command expansion in config value evaluation.

--additional-config=<file>[1]

Configuration file with additional configuration.

--force

Force a consistency check to be run, despite resources, and may run a more thorough check.

--check-indexes[=true|false]

Perform consistency checks on indexes.

true

--check-graph[=true|false]

Perform consistency checks between nodes, relationships, properties, types, and tokens.

true

--check-counts[=true|false]

Perform consistency checks on the counts. Requires <check-graph>, and may implicitly enable <check-graph> if it were not explicitly disabled.

<check-graph>

--check-property-owners[=true|false]

Perform consistency checks on the ownership of properties. Requires <check-graph>, and may implicitly enable <check-graph> if it were not explicitly disabled.

false

--report-path=<path>

Path to where a consistency report will be written. Interpreted as a directory, unless it has an extension of .report.

.

--max-off-heap-memory=<size>

Maximum memory that neo4j-admin can use for page cache and various caching data structures to improve performance. Value can be plain numbers, like 10000000 or e.g. 20G for 20 gigabytes, or even e.g. 70%, which will amount to 70% of currently free memory on the machine.

90%

--threads=<number of threads>

Number of threads used to check the consistency.

The number of CPUs on the machine.

--from-path-data=<path>

Path to the databases directory, containing the database directory to source from.

server.directories.data/databases

--from-path-txn=<path>

Path to the transactions directory, containing the transaction directory for the database to source from.

server.directories.transaction.logs.root

--from-path=<path>

Path to the directory containing dump/backup artifacts that need to be checked for consistency. If the directory contains multiple backups, it will select the most recent backup chain, based on the transaction IDs found, to perform the consistency check.

--temp-path=<path>

Path to directory to be used as a staging area to extract dump/backup artifacts, if needed.

<from-path>

1. See Tools → Configuration for details.

The --from-path=<path> option can also check database artifacts in AWS S3 buckets (from Neo4j 5.19) and Google Cloud storage buckets (from Neo4j 5.21). For more information, see Check the consistency of a backup/dump stored in a cloud storage.

Output

If the consistency checker does not find errors, it exits cleanly and does not produce a report. If the consistency checker finds errors, it exits with an exit code other than 0 and writes a report file with a name in the format inconsistencies-YYYY-MM-DD.HH24.MI.SS.report. The location of the report file is the current working directory, or as specified by the parameter report-path.

Examples

The following are examples of how to check the consistency of a database, a dump, or a backup.

neo4j-admin database check cannot be applied to Composite databases. It must be run directly on the databases that are associated with that Composite database.

Check the consistency of a local database

Note that the database must be stopped first.

bin/neo4j-admin database check neo4j

The output will look similar to the following:

Running consistency check with max off-heap:618.6MiB
  Store size:160.0KiB
  Allocated page cache:160.0KiB
  Off-heap memory for caching:618.5MiB
ID Generator consistency check
....................  10%
....................  20%
....................  30%
....................  40%
....................  50%
....................  60%
....................  70%
....................  80%
....................  90%
.................... 100%
Index structure consistency check
....................  10%
....................  20%
....................  30%
....................  40%
....................  50%
....................  60%
....................  70%
....................  80%
....................  90%
.................... 100%
Consistency check
....................  10%
....................  20%
....................  30%
....................  40%
....................  50%
....................  60%
....................  70%
....................  80%
....................  90%
.................... 100%

Check the consistency of a backup/dump

Run with the --from-path option to check the consistency of a backup or a dump:

bin/neo4j-admin database check --from-path=<directory-with-backup-or-dump> neo4j

Check the consistency of a backup/dump stored in a cloud storage

The following examples show how to check the consistency of a backup or a dump stored in a cloud storage bucket using the --from-path option.

Neo4j uses the AWS SDK v2 to call the APIs on AWS using AWS URLs. Alternatively, you can override the endpoints so that the AWS SDK can communicate with alternative storage systems, such as Ceph, Minio, or LocalStack, using the system variables aws.endpointUrls3, aws.endpointUrlS3, or aws.endpointUrl, or the environments variables AWS_ENDPOINT_URL_S3 or AWS_ENDPOINT_URL.

  1. Install the AWS CLI by following the instructions in the AWS official documentation — Install the AWS CLI version 2.

  2. Create an S3 bucket and a directory to store the backup files using the AWS CLI:

    aws s3 mb --region=us-east-1 s3://myBucket
    aws s3api put-object --bucket myBucket --key myDirectory/

    For more information on how to create a bucket and use the AWS CLI, see the AWS official documentation — Use Amazon S3 with the AWS CLI and Use high-level (s3) commands with the AWS CLI.

  3. Verify that the ~/.aws/config file is correct by running the following command:

    cat ~/.aws/config

    The output should look like this:

    [default]
    region=us-east-1
  4. Configure the access to your AWS S3 bucket by setting the aws_access_key_id and aws_secret_access_key in the ~/.aws/credentials file and, if needed, using a bucket policy. For example:

    1. Use aws configure set aws_access_key_id aws_secret_access_key command to set your IAM credentials from AWS and verify that the ~/.aws/credentials is correct:

      cat ~/.aws/credentials

      The output should look like this:

      [default]
      aws_access_key_id=this.is.secret
      aws_secret_access_key=this.is.super.secret
    2. Additionally, you can use a resource-based policy to grant access permissions to your S3 bucket and the objects in it. Create a policy document with the following content and attach it to the bucket. Note that both resource entries are important to be able to download and upload files.

      {
          "Version": "2012-10-17",
          "Id": "Neo4jBackupAggregatePolicy",
          "Statement": [
              {
                  "Sid": "Neo4jBackupAggregateStatement",
                  "Effect": "Allow",
                  "Action": [
                      "s3:ListBucket",
                      "s3:GetObject",
                      "s3:PutObject",
                      "s3:DeleteObject"
                  ],
                  "Resource": [
                      "arn:aws:s3:::myBucket/*",
                      "arn:aws:s3:::myBucket"
                  ]
              }
          ]
      }
  5. Run the bin/neo4j-admin database check command to check the consistency of your database located in your AWS S3 storage bucket. The example assumes that you have backup or dump artifacts located in the myBucket/myDirectory folder in your bucket.

    bin/neo4j-admin database check mydatabase --from-path=s3://myBucket/myDirectory/
  1. Ensure you have a Google account and a project created in the Google Cloud Platform (GCP).

    1. Install the gcloud CLI by following the instructions in the Google official documentation — Install the gcloud CLI.

    2. Create a service account and a service account key using Google official documentation — Create service accounts and Creating and managing service account keys.

    3. Download the JSON key file for the service account.

    4. Set the GOOGLE_APPLICATION_CREDENTIALS and GOOGLE_CLOUD_PROJECT environment variables to the path of the JSON key file and the project ID, respectively:

      export GOOGLE_APPLICATION_CREDENTIALS="/path/to/keyfile.json"
      export GOOGLE_CLOUD_PROJECT=YOUR_PROJECT_ID
    5. Authenticate the gcloud CLI with the e-mail address of the service account you have created, the path to the JSON key file, and the project ID:

      gcloud auth activate-service-account service-account@example.com --key-file=$GOOGLE_APPLICATION_CREDENTIALS --project=$GOOGLE_CLOUD_PROJECT

      For more information, see the Google official documentation — gcloud auth activate-service-account.

    6. Create a bucket in the Google Cloud Storage using Google official documentation — Create buckets.

    7. Verify that the bucket is created by running the following command:

      gcloud storage ls

      The output should list the created bucket.

  2. Run the bin/neo4j-admin database check command to check the consistency of your database located in your Google storage bucket. The example assumes that you have backup or dump artifacts located in the myBucket/myDirectory folder in your bucket.

    bin/neo4j-admin database check mydatabase --from-path=gs://myBucket/myDirectory/
  1. Ensure you have an Azure account, an Azure storage account, and a blob container.

    1. You can create a storage account using the Azure portal.
      For more information, see the Azure official documentation on Create a storage account.

    2. Create a blob container in the Azure portal.
      For more information, see the Azure official documentation on Quickstart: Upload, download, and list blobs with the Azure portal.

  2. Install the Azure CLI by following the instructions in the Azure official documentation — Azure official documentation.

  3. Authenticate the neo4j or neo4j-admin process against Azure using the default Azure credentials.
    See the Azure official documentation on default Azure credentials for more information.

    az login

    Then you should be ready to use Azure URLs in either neo4j or neo4j-admin.

  4. To validate that you have access to the container with your login credentials, run the following commands:

    # Upload a file:
    az storage blob upload --file someLocalFile  --account-name accountName - --container someContainer --name remoteFileName  --auth-mode login
    
    # Download the file
    az storage blob download  --account-name accountName --container someContainer --name remoteFileName --file downloadedFile --auth-mode login
    
    # List container files
    az storage blob list  --account-name someContainer --container someContainer  --auth-mode login
  5. Run the bin/neo4j-admin database check command to check the consistency of your database located in your Azure blob storage container. The example assumes that you have backup or dump artifacts located in the myStorageAccount/myContainer/myDirectory folder in Azure.

    bin/neo4j-admin database check mydatabase --from-path=azb://myStorageAccount/myContainer/myDirectory/