Copy a database store

You can use the neo4j-admin database copy command to copy a database, create a compacted/defragmented copy of a database, clean up database inconsistencies, or do a direct migration from Neo4j 4.4 to any 5.x version. neo4j-admin database copy reclaims the unused space, creates a defragmented copy of the data store, and creates the node label and relationship type lookup indexes.

Command limitations
  • neo4j-admin database copy preserves the node IDs (unless --compact-node-store is used), but the relationships get new IDs.

  • neo4j-admin database copy is not supported for use on the system database.

  • neo4j-admin database copy is not supported for use on Composite databases. It must be run directly on the databases that are associated with that Composite database.

  • neo4j-admin database copy is an IOPS-intensive process. For more information, see Estimating the processing time.

Command

neo4j-admin database copy copies the data store of an existing offline database to a new database.

Syntax

neo4j-admin database copy [-h] [--copy-schema] [--expand-commands] [--force] [--verbose] [--compact-node-store[=true|false]]
                             [--additional-config=<file>] [--from-pagecache=<size>] [--temp-path=<path>] [--to-format=<format>]
                             [--to-path-schema=<path>] [--copy-only-node-properties=<label.property>[,<label.property>...]]...
                             [--copy-only-nodes-with-labels=<label>[,<label>...]]... [--copy-only-relationship-properties=<relationship.
                             property>[,<relationship.property>...]]... [--copy-only-relationships-with-types=<type>[,<type>...]]...
                             [--ignore-nodes-with-labels=<label>[,<label>...]]... [--ignore-relationships-with-types=<type>[,<type>...]]...
                             [--skip-labels=<label>[,<label>...]]... [--skip-node-properties=<label.property>[,<label.property>...]]...
                             [--skip-properties=<property>[,<property>...]]... [--skip-relationship-properties=<relationship.property>[,
                             <relationship.property>...]]... [--from-path-data=<path> --from-path-txn=<path>] [--to-path-data=<path>
                             --to-path-txn=<path>] <fromDatabase> <toDatabase>

Description

This command will create a copy of a database. If your labels, properties, or relationships contain dots or commas, you can use backticks to quote them, e.g. My,label, My.property. A file named <database-name>-schema.cypher, containing the schema commands needed to recreate indexes/constraints on the copy, will be created.

From Neo4j 5.20 onwards, you can use the --copy-schema option to automatically copy the schema. Indexes will be built the first time the database is started. This option can copy the schema from any 4.4 and 5.x version to 5.20 and later versions.

Parameters

Table 1. neo4j-admin database copy parameters
Parameter Description

<fromDatabase>

Name of the source database.

<toDatabase>

Name of the target database. If the same as <fromDatabase>, it is copied to a temporary location, by default the current working directory or the path as defined by --temp-path, before being moved to replace the original.

From Neo4j 5.5, you can use the same values for <fromDatabase> and <toDatabase> if you do not need an actual copy of the database. The command will replace the original database with the newly created copy.

Options

The neo4j-admin database copy command has the following options:

Table 2. neo4j-admin database copy options
Option Description Default

--additional-config=<file>[1]

Configuration file with additional configuration.

--compact-node-store[=true|false]

By default node store is not compacted on copy since that changes node ids. Please use this option to enforce node store compaction.

false

--copy-only-node-properties=<label.property>[,<label.property>…​]

A comma-separated list of property keys to include in the copy for nodes with the specified label. Any labels not explicitly mentioned will have all their properties included in the copy. Cannot be combined with --skip-properties or --skip-node-properties.

--copy-only-nodes-with-labels=<label>[,<label>…​]

A comma-separated list of labels. All nodes that have ANY of the specified labels will be included in the copy. Cannot be combined with --ignore-nodes-with-labels.

--copy-only-relationship-properties=<relationship.property>[,<relationship.property>…​]

A comma-separated list of property keys to include in the copy for relationships with the specified type. Any relationship types not explicitly mentioned will have all their properties included in the copy. Cannot be combined with --skip-properties or --skip-relationship-properties.

--copy-only-relationships-with-types=<type>[,<type>…​]

A comma-separated list of relationship types. All relationships with any of the specified types will be included in the copy. Cannot be combined with --ignore-relationships-with-types.

--copy-schema

Introduced in 5.20 Copy the schema instead of generating schema statements, meaning index and constraint definitions. The indexes will be built the first time the database is started.

--expand-commands

Allow command expansion in config value evaluation.

--force

Force the command to run even if the integrity of the database cannot be verified.

--from-pagecache=<size>

The size of the page cache to use for reading.

You can use the --from-pagecache option to speed up the copy operation by specifying how much cache to allocate when reading the source. The --from-pagecache should be assigned whatever memory you can spare since Neo4j does random reads from the source.

8m

--from-path-data=<path>

Path to the databases directory, containing the database directory to source from. It can be used to target databases outside of the installation.

server.directories.data/databases

--from-path-txn=<path>

Path to the transactions directory, containing the transaction directory for the database to source from.

server.directories.transaction.logs.root

-h, --help

Show this help message and exit.

--ignore-nodes-with-labels=<label>[,<label>…​]

A comma-separated list of labels. Nodes that have ANY of the specified labels will not be included in the copy. Cannot be combined with --copy-only-nodes-with-labels.

--ignore-relationships-with-types=<type>[,<type>…​]

A comma-separated list of relationship types. Relationships with any of the specified relationship types will not be included in the copy. Cannot be combined with --copy-only-relationships-with-types.

--skip-labels=<label>[,<label>…​]

A comma-separated list of labels to ignore.

--skip-node-properties=<label.property>[,<label.property>…​]

A comma-separated list of property keys to ignore for nodes with the specified label. Cannot be combined with --skip-properties or --copy-only-node-properties.

--skip-properties=<property>[,<property>…​]

A comma-separated list of property keys to ignore. Cannot be combined with --skip-node-properties, --copy-only-node-properties, --skip-relationship-properties or --copy-only-relationship-properties.

--skip-relationship-properties=<relationship.property>[,<relationship.property>…​]

A comma-separated list of property keys to ignore for relationships with the specified type. Cannot be combined with --skip-properties or --copy-only-relationship-properties.

--temp-path=<path>

Introduced in 5.24 Path to a directory to be used as a staging area when the source and target databases are the same. Default is the current directory.

--to-format=<format>

Set the format for the new database. Must be one of same, standard, high_limit, aligned, block. same will use the same format as the source.

If you go from high_limit to standard or aligned, there is no validation that the data will actually fit.

same

--to-path-data=<path>

Path to the databases directory, containing the database directory to target from.

server.directories.data/databases

--to-path-schema=<path>

Path to directory to create the schema commands file in. Default is the current directory.

--to-path-txn=<path>

Path to the transactions directory containing the transaction directory for the database to target from.

server.directories.transaction.logs.root

--verbose

Enable verbose output.

1. See Tools → Configuration for details.

The block format is introduced in Neo4j 5.14 and from Neo4j 5.22, is the default format for all newly-created databases as long as they do not have the db.format setting specified. For more information on the block format, see Store formats.

Examples

Copying the data store of a database

You can use neo4j-admin database copy to copy the data store of a database, for example, neo4j.

  1. Stop the database named neo4j:

    STOP DATABASE neo4j
  2. Copy the data store from neo4j to a new database called database-copy.

    If you do not need an actual copy of the database, you can use the same values for <fromDatabase> and <toDatabase>. The command replaces the original database with the newly created copy.

    From Neo4j 5.20 onwards, you can use the --copy-schema option to automatically copy the schema. Indexes will be built the first time the database is started. This option copies the schema from any 4.4 and 5.x version to 5.20 and later versions.

    For previous versions, you need to manually recreate the schema using the Cyher statements saved in the file <database-name>-schema.cypher.

    bin/neo4j-admin database copy neo4j database-copy
  3. Verify that the database has been successfully copied:

    ls -al ../data/databases

    Copying a database does not automatically create it. Therefore, it will not be visible if you do SHOW DATABASES at this point.

  4. Create the copied database.

    CREATE DATABASE database-copy
  5. Verify that the new database is online.

    SHOW DATABASES
  6. (For versions before Neo4j 5.20) If your original database has a schema defined, change your active database to the copied database and recreate the schema using the schema commands saved in the file <database-name>-schema.cypher.

Filtering data while copying a database

You can use neo4j-admin database copy to filter out any unwanted data while copying a database, for example, by removing nodes, labels, properties, and relationships.

bin/neo4j-admin database copy neo4j copy --ignore-nodes-with-labels="Cat,Dog"

The command creates a copy of the database neo4j but without the nodes with the labels :Cat and :Dog.

Labels are processed independently, i.e., the filter ignores any node with a label :Cat, :Dog, or both.

For a detailed example of how to use neo4j-admin database copy to filter out data for sharding a database, see Sharding data with the copy command.

Further compacting an existing database

You can use the command neo4j-admin database copy with the argument -compact-node-store to further compact the store of an existing database.
This example uses the same values for <toDatabase> and <fromDatabase>, which means that the command will compact the database in place by creating a new version of the database. After running the command, you need to recreate the indexes using the generated script. If the database belongs to a cluster, you also need to reseed the cluster from that server. For more information, see Designated seeder.

Note that even though there is only one database copy in the end, you still need double the space during the operation.

  1. Stop the database named neo4j:

    STOP DATABASE neo4j
  2. Compact the neo4j database using the command:

    bin/neo4j-admin database copy neo4j neo4j --compact-node-store --temp-path=<my-prefered-staging-area>

    --temp-path, introduced in Neo4j 5.24, can be used to specify a different directory to use as a temporary staging area. If omitted, the current working directory will be used.

    From Neo4j 5.20 onwards, you can use the --copy-schema option to automatically copy the schema. Indexes will be built the first time the database is started. This option can copy the schema from any 4.4 and 5.x to 5.20 and later versions.

    For previous versions, you need to manually recreate the schema using the Cyher statements saved in the file <database-name>-schema.cypher.

  3. Start the neo4j database. This is the newly created version of the database.

    START DATABASE neo4j
  4. (For versions before Neo4j 5.20) If your original database has a schema defined, recreate the schema using the schema commands saved in the file <database-name>-schema.cypher.

For a detailed example of how to reclaim unused space, see Reclaim unused space.

Estimating the processing time

Estimations for how long the neo4j-admin database copy command takes can be made based on the following:

  • Neo4j, like many other databases, does IO in 8K pages.

  • Your disc manufacturer will have a value for the maximum IOPS it can process.

For example, if your disc manufacturer has provided a maximum of 5000 IOPS, you can reasonably expect up to 5000 such page operations a second. Therefore, the maximal theoretical throughput you can expect is 40MB/s (or 144 GB/hour) on that disc. You may then assume that the best-case scenario for running neo4j-admin database copy on that 5000 IOPS disc is that it takes at least 1 hour to process a 144 GB database. [1]

However, it is important to remember that the process must read 144 GB from the source database, and must also write to the target store (assuming the target store is of comparable size). Additionally, there are internal processes during the copy that reads/modifies/writes the store multiple times. Therefore, with an additional 144 GB of both read and write, the best-case scenario for running neo4j-admin database copy on a 5000 IOPS disc is that it takes at least 3 hours to process a 144 GB database.

Finally, it is also important to consider that in almost all Cloud environments, the published IOPS value may not be the same as the actual value, or be able to continuously maintain the maximum possible IOPS. The real processing time for this example could be well above that estimation of 3 hours.


1. The calculations are based on MB/s = (IOPS * B) ÷ 10^6, where B is the block size in bytes; in the case of Neo4j, this is 8000. GB/hour can then be calculated from (MB/s * 3600) ÷ 1000.