Performance recommendations
Always specify the target database
Specify the target database on all queries with the .withDatabase()
method, either in Driver.executableQuery()
calls or when creating new sessions.
If no database is provided, the driver has to send an extra request to the server to figure out what the default database is.
The overhead is minimal for a single query, but becomes significant over hundreds of queries.
Good practices
driver.executableQuery("<QUERY>")
.withConfig(QueryConfig.builder().withDatabase("<DB NAME>").build())
.execute();
driver.session(SessionConfig.builder().withDatabase("<DB NAME>").build());
Bad practices
driver.executableQuery("<QUERY>")
.execute();
driver.session();
Be aware of the cost of transactions
When submitting queries through .executableQuery()
or through .executeRead/Write()
, the server automatically wraps them into a transaction.
This behavior ensures that the database always ends up in a consistent state, regardless of what happens during the execution of a transaction (power outages, software crashes, etc).
Creating a safe execution context around a number of queries yields an overhead that is not present if the driver just shoots queries at the server and hopes they will get through.
The overhead is small, but can add up as the number of queries increases.
For this reason, if your use case values throughput more than data integrity, you may extract further performance by running all queries within a single (auto-commit) transaction.
You do this by creating a session and using session.run()
to run as many queries as needed.
try (var session = driver.session(SessionConfig.builder().withDatabase("neo4j").build())) {
for (int i=0; i<1000; i++) {
session.run("<QUERY>");
}
}
for (int i=0; i<1000; i++) {
driver.executableQuery("<QUERY>").execute();
// or session.executeRead/Write() calls
}
Route read queries to cluster readers
In a cluster, route read queries to any reader node. You do this by:
-
using the method
.withRouting(RoutingControl.READ)
inDriver.executableQuery()
calls -
using
Session.executeRead()
instead ofSession.executeWrite()
(for managed transactions) -
using the method
.withRouting(RoutingControl.READ)
when creating a new session (for explicit transactions).
Good practices
// import org.neo4j.driver.RoutingControl;
driver.executableQuery("MATCH (p:Person) RETURN p")
.withConfig(QueryConfig.builder()
.withDatabase("neo4j")
.withRouting(RoutingControl.READ)
.build())
.execute();
try (var session = driver.session(SessionConfig.builder().withDatabase("neo4j").build())) {
session.executeRead(tx -> {
var result = tx.run("MATCH (p:Person) RETURN p");
return result.list();
});
}
Bad practices
// defaults to routing = writers
driver.executableQuery("MATCH (p:Person) RETURN p")
.withConfig(QueryConfig.builder()
.withDatabase("neo4j")
.build())
.execute();
// don't ask to write on a read-only operation
try (var session = driver.session(SessionConfig.builder().withDatabase("neo4j").build())) {
session.executeWrite(tx -> {
var result = tx.run("MATCH (p:Person) RETURN p");
return result.list();
});
}
Create indexes
Create indexes for properties that you often filter against.
For example, if you often look up Person
nodes by the name
property, it is beneficial to create an index on Person.name
.
You can create indexes with the CREATE INDEX
Cypher clause, for both nodes and relationships.
driver.executableQuery("CREATE INDEX person_name FOR (n:Person) ON (n.name)").execute();
For more information, see Indexes for search performance.
Profile queries
Profile your queries to locate queries whose performance can be improved.
You can profile queries by prepending them with PROFILE
.
The server output is available through the .profile()
method of the ResultSummary
object.
var result = driver.executableQuery("PROFILE MATCH (p {name: $name}) RETURN p")
.withParameters(Map.of("name", "Alice"))
.withConfig(QueryConfig.builder().withDatabase("neo4j").build())
.execute();
var queryPlan = result.summary().profile().arguments().get("string-representation");
System.out.println(queryPlan);
/*
Planner COST
Runtime PIPELINED
Runtime version 5.0
Batch size 128
+-----------------+----------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
| Operator | Details | Estimated Rows | Rows | DB Hits | Memory (Bytes) | Page Cache Hits/Misses | Time (ms) | Pipeline |
+-----------------+----------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
| +ProduceResults | p | 1 | 1 | 3 | | | | |
| | +----------------+----------------+------+---------+----------------+ | | |
| +Filter | p.name = $name | 1 | 1 | 4 | | | | |
| | +----------------+----------------+------+---------+----------------+ | | |
| +AllNodesScan | p | 10 | 4 | 5 | 120 | 9160/0 | 108.923 | Fused in Pipeline 0 |
+-----------------+----------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
Total database accesses: 12, total allocated memory: 184
*/
In case some queries are so slow that you are unable to even run them in reasonable times, you can prepend them with EXPLAIN
instead of PROFILE
.
This will return the plan that the server would use to run the query, but without executing it.
The server output is available through the .plan()
method of the ResultSummary
object.
var result = driver.executableQuery("EXPLAIN MATCH (p {name: $name}) RETURN p")
.withParameters(Map.of("name", "Alice"))
.withConfig(QueryConfig.builder().withDatabase("neo4j").build())
.execute();
var queryPlan = result.summary().plan().arguments().get("string-representation");
System.out.println(queryPlan);
/*
Planner COST
Runtime PIPELINED
Runtime version 5.0
Batch size 128
+-----------------+----------------+----------------+---------------------+
| Operator | Details | Estimated Rows | Pipeline |
+-----------------+----------------+----------------+---------------------+
| +ProduceResults | p | 1 | |
| | +----------------+----------------+ |
| +Filter | p.name = $name | 1 | |
| | +----------------+----------------+ |
| +AllNodesScan | p | 10 | Fused in Pipeline 0 |
+-----------------+----------------+----------------+---------------------+
Total database accesses: ?
*/
Specify node labels
Specify node labels in all queries. This allows the query planner to work much more efficiently, and to leverage indexes where available. To learn how to combine labels, see Cypher → Label expressions.
Good practices
driver.executableQuery("MATCH (p:Person|Animal {name: $name}) RETURN p")
.withParameters(Map.of("name", "Alice"))
.withConfig(QueryConfig.builder().withDatabase("neo4j").build())
.execute();
try (var session = driver.session(SessionConfig.builder().withDatabase("neo4j").build())) {
session.run("MATCH (p:Person|Animal {name: $name}) RETURN p", Map.of("name", "Alice"));
}
Bad practices
driver.executableQuery("MATCH (p {name: $name}) RETURN p")
.withParameters(Map.of("name", "Alice"))
.withConfig(QueryConfig.builder().withDatabase("neo4j").build())
.execute();
try (var session = driver.session(SessionConfig.builder().withDatabase("neo4j").build())) {
session.run("MATCH (p {name: $name}) RETURN p", Map.of("name", "Alice"));
}
Batch data creation
Good practice
// Generate a sequence of numbers
int start = 1;
int end = 10000;
List<Map> numbers = new ArrayList<>(end - start + 1);
for (int i=start; i<=end; i++) {
numbers.add(Map.of("value", i));
}
driver.executableQuery("""
UNWIND $numbers AS node
CREATE (:Number {value: node.value})
""")
.withParameters(Map.of("numbers", numbers))
.withConfig(QueryConfig.builder().withDatabase("neo4j").build())
.execute();
Bad practice
for (int i=1; i<=10000; i++) {
driver.executableQuery("CREATE (:Number {value: $value})")
.withParameters(Map.of("value", i))
.withConfig(QueryConfig.builder().withDatabase("neo4j").build())
.execute();
}
The most efficient way of performing a first import of large amounts of data into a new database is the neo4j-admin database import command.
|
Use query parameters
Always use query parameters instead of hardcoding or concatenating values into queries. Besides protecting from Cypher injections, this allows to better leverage the database query cache.
Good practices
driver.executableQuery("MATCH (p:Person {name: $name}) RETURN p")
.withParameters(Map.of("name", "Alice"))
.withConfig(QueryConfig.builder().withDatabase("neo4j").build())
.execute();
try (var session = driver.session(SessionConfig.builder().withDatabase("neo4j").build())) {
session.run("MATCH (p:Person {name: $name}) RETURN p", Map.of("name", "Alice"));
}
Bad practices
driver.executableQuery("MATCH (p:Person {name: 'Alice'}) RETURN p")
.withConfig(QueryConfig.builder().withDatabase("neo4j").build())
.execute();
// or
String name = "Alice";
driver.executableQuery("MATCH (p:Person {name: '" + name + "'}) RETURN p")
.withConfig(QueryConfig.builder().withDatabase("neo4j").build())
.execute();
try (var session = driver.session(SessionConfig.builder().withDatabase("neo4j").build())) {
session.run("MATCH (p:Person {name: 'Alice'}) RETURN p");
// or
String name = "Alice";
session.run("MATCH (p:Person {name: '" + name + "'}) RETURN p");
}
Concurrency
Use asynchronous querying. This is likely to be more impactful on performance if you parallelize complex and time-consuming queries in your application, but not so much if you run many simple ones.
Use MERGE
for creation only when needed
The Cypher clause MERGE
is convenient for data creation, as it allows to avoid duplicate data when an exact clone of the given pattern exists.
However, it requires the database to run two queries: it first needs to MATCH
the pattern, and only then can it CREATE
it (if needed).
If you know already that the data you are inserting is new, avoid using MERGE
and use CREATE
directly instead — this practically halves the number of database queries.
Filter notifications
Filter the category and/or severity of notifications the server should raise.
Glossary
- LTS
-
A Long Term Support release is one guaranteed to be supported for a number of years. Neo4j 4.4 is LTS, and Neo4j 5 will also have an LTS version.
- Aura
-
Aura is Neo4j’s fully managed cloud service. It comes with both free and paid plans.
- Cypher
-
Cypher is Neo4j’s graph query language that lets you retrieve data from the database. It is like SQL, but for graphs.
- APOC
-
Awesome Procedures On Cypher (APOC) is a library of (many) functions that can not be easily expressed in Cypher itself.
- Bolt
-
Bolt is the protocol used for interaction between Neo4j instances and drivers. It listens on port 7687 by default.
- ACID
-
Atomicity, Consistency, Isolation, Durability (ACID) are properties guaranteeing that database transactions are processed reliably. An ACID-compliant DBMS ensures that the data in the database remains accurate and consistent despite failures.
- eventual consistency
-
A database is eventually consistent if it provides the guarantee that all cluster members will, at some point in time, store the latest version of the data.
- causal consistency
-
A database is causally consistent if read and write queries are seen by every member of the cluster in the same order. This is stronger than eventual consistency.
- NULL
-
The null marker is not a type but a placeholder for absence of value. For more information, see Cypher → Working with
null
. - transaction
-
A transaction is a unit of work that is either committed in its entirety or rolled back on failure. An example is a bank transfer: it involves multiple steps, but they must all succeed or be reverted, to avoid money being subtracted from one account but not added to the other.
- backpressure
-
Backpressure is a force opposing the flow of data. It ensures that the client is not being overwhelmed by data faster than it can handle.
- transaction function
-
A transaction function is a callback executed by an
executeRead
orexecuteWrite
call. The driver automatically re-executes the callback in case of server failure. - Driver
-
A
Driver
object holds the details required to establish connections with a Neo4j database.