Clause composition
This section describes the semantics of Cypher® when composing different read and write clauses.
A query is made up from several clauses chained together. These are discussed in more detail in the chapter on Clauses.
The semantics of a whole query is defined by the semantics of its clauses. Each clause has as input the state of the graph and a table of intermediate results consisting of the current variables. The output of a clause is a new state of the graph and a new table of intermediate results, serving as input to the next clause. The first clause takes as input the state of the graph before the query and an empty table of intermediate results. The output of the last clause is the result of the query.
Unless ORDER BY is used, Neo4j does not guarantee the row order of a query result. |
The following example graph is used throughout this section.
Now follows the table of intermediate results and the state of the graph after each clause for the following query:
MATCH (john:Person {name: 'John'})
MATCH (john)-[:FRIEND]->(friend)
RETURN friend.name AS friendName
The query only has read clauses, so the state of the graph remains unchanged and is therefore omitted below.
Clause | Table of intermediate results after the clause | ||||||
---|---|---|---|---|---|---|---|
MATCH (john:Person {name: 'John'}) |
|
||||||
MATCH (john)-[:FRIEND]->(friend) |
|
||||||
RETURN friend.name AS friendName |
|
The above example only looked at clauses that allow linear composition and omitted write clauses. The next section will explore these non-linear composition and write clauses.
Read-write queries
In a Cypher query, read and write clauses can take turns. The most important aspect of read-write queries is that the state of the graph also changes between clauses.
A clause can never observe writes made by a later clause. |
Using the same example graph as above, this example shows the table of intermediate results and the state of the graph after each clause for the following query:
MATCH (j:Person) WHERE j.name STARTS WITH "J"
CREATE (j)-[:FRIEND]->(jj:Person {name: "Jay-jay"})
The query finds all nodes where the name
property starts with "J"
and for each such node it creates another node with the name
property set to "Jay-jay".
Clause | Table of intermediate results after the clause | State of the graph after the clause, changes in red | ||||||
---|---|---|---|---|---|---|---|---|
MATCH (j:Person) WHERE j.name STARTS WITH "J" |
|
|||||||
CREATE (j)-[:FRIEND]->(jj:Person {name: "Jay-jay"}) |
|
It is important to note that the MATCH
clause does not find the Person
nodes that are created by the CREATE
clause,
even though the name "Jay-jay" starts with "J".
This is because the CREATE
clause comes after the MATCH
clause and thus the MATCH
can not observe any changes to
the graph made by the CREATE
.
Queries with UNION
UNION
queries are slightly different because the results of two or more queries are put together,
but each query starts with an empty table of intermediate results.
In a query with a UNION
clause, any clause before the UNION
cannot observe writes made by a clause after the UNION
.
Any clause after UNION
can observe all writes made by a clause before the UNION
.
This means that the rule that a clause can never observe writes made by a later clause still applies in queries using UNION
.
UNION
Using the same example graph as above, this example shows the table of intermediate results and the state of the graph after each clause for the following query:
CREATE (jj:Person {name: "Jay-jay"})
RETURN count(*) AS count
UNION
MATCH (j:Person) WHERE j.name STARTS WITH "J"
RETURN count(*) AS count
Clause | Table of intermediate results after the clause | State of the graph after the clause, changes in red | ||||
---|---|---|---|---|---|---|
CREATE (jj:Person {name: "Jay-jay"}) |
|
|||||
RETURN count(*) AS count |
|
|||||
MATCH (j:Person) WHERE j.name STARTS WITH "J" |
|
|||||
RETURN count(*) AS count |
|
It is important to note that the MATCH
clause finds the Person
node that is created by the CREATE
clause.
This is because the CREATE
clause comes before the MATCH
clause and thus the MATCH
can observe any changes to
the graph made by the CREATE
.
Queries with CALL {}
subqueries
Subqueries inside a CALL {}
clause are evaluated for each incoming input row.
This means that write clauses inside a subquery can get executed more than once.
The different invocations of the subquery are executed in turn, in the order of the incoming input rows.
Later invocations of the subquery can observe writes made by earlier invocations of the subquery.
CALL {}
Using the same example graph as above, this example shows the table of intermediate results and the state of the graph after each clause for the following query:
The below query uses a variable scope clause (introduced in Neo4j 5.23) to import variables into the CALL subquery.
If you are using an older version of Neo4j, use an importing WITH clause instead.
|
MATCH (john:Person {name: 'John'})
SET john.friends = []
WITH john
MATCH (john)-[:FRIEND]->(friend)
WITH john, friend
CALL (john, friend) {
WITH john.friends AS friends
SET john.friends = friends + friend.name
}
Clause | Table of intermediate results after the clause | State of the graph after the clause, changes in red | ||||||
---|---|---|---|---|---|---|---|---|
MATCH (john:Person {name: 'John'}) |
|
|||||||
SET john.friends = [] |
|
|||||||
MATCH (john)-[:FRIEND]->(friend) |
|
|||||||
First invocation of WITH john.friends AS friends |
|
|||||||
First invocation of SET john.friends = friends + friend.name |
|
|||||||
Second invocation of WITH john.friends AS friends |
|
|||||||
Second invocation of SET john.friends = friends + friend.name |
|
It is important to note that, in the subquery, the second invocation of the WITH
clause could observe
the writes made by the first invocation of the SET
clause.
Notes on the implementation
An easy way to implement the semantics outlined above is to fully execute each clause and materialize the table of intermediate results before executing the next clause. This approach would consume a lot of memory for materializing the tables of intermediate results and would generally not perform well.
Instead, Cypher will in general try to interleave the execution of clauses.
This is called lazy evaluation.
It only materializes intermediate results when needed.
In many read-write queries it is unproblematic to execute clauses interleaved, but when it is not,
Cypher must ensure that the table of intermediate results gets materialized at the right time(s).
This is done by inserting an Eager
operator into the execution plan.