Knowledge Base

Post-UNION processing

Cypher does not allow further processing of UNION or UNION ALL results, since RETURN is required in all queries of the union.

Here are some workarounds.

Post-UNION processing in Neo4j 4.0

With Neo4j 4.0, post-UNION processing is now possible via subqueries.

An example of usage:

CALL {
  MATCH (movieOrPerson:Movie) RETURN movieOrPerson
  UNION
  MATCH (movieOrPerson:Person) RETURN movieOrPerson
}
WITH movieOrPerson
...

This allows us to keep working with the results of the UNION subquery.

However, with the initial 4.0 release only uncorrelated subqueries are supported, meaning the subquery cannot use variables from outside the call. This means that using a subquery for post-UNION processing in the middle of a more complex query may not be possible, as you cannot pass in variables from the outer query for use within the subquery.

In 4.0.x, correlated subqueries, which are more useful and can use variables from outside the subquery, are currently only available when using Neo4j Fabric.

Post-UNION processing enhancements in 4.1+

With the release of 4.1, the CALL subquery functionality has been enhanced to allow correlated subqueries. This lets us use existing variables mid-query within the subquery.

This requires the use of WITH as the first clause within the subquery CALL block, for the purpose of importing variables to the subquery.

When using UNION or UNION ALL, we can provide a similar importing WITH clause for each of the unioned queries:

MATCH (m:Movie {title:'The Matrix'})
CALL {
    WITH m
    MATCH (m)<-[:ACTED_IN]-(p)
    RETURN p
    UNION
    WITH m
    MATCH (m)<-[:DIRECTED]-(p)
    RETURN p
}
RETURN p.name as name
ORDER BY name ASC

This will correctly return the alphabetically-sorted names of all actors and directors for The Matrix.

This import usage has some special restrictions that do not normally apply to WITH usage:

  1. You may only include variables from the outer query and no others. + You cannot perform calculations, aggregations, or introduction of new variables in the initial WITH.

  2. You cannot alias any variables within this initial WITH.

  3. You cannot follow the initial WITH with a WHERE clause for filtering.

If you try any of these, you will be met with some kind of error, such as:

Importing WITH should consist only of simple references to outside variables. Aliasing or expressions are not supported.

or more cryptically, if you try to use a WHERE clause after the initial WITH

Variable `x` not defined

(where the variable is the first one present in the WITH clause)

You can get around all of these restrictions by simply introducing an additional WITH clause after the importing WITH, like so:

MATCH (m:Movie)
WHERE m.title CONTAINS 'Matrix'
CALL {
    WITH m
    WITH m as movie
    MATCH (m)<-[:DIRECTED]-(p)
    RETURN p.name as name
    UNION
    WITH m
    WITH m
    WHERE m.title CONTAINS 'Reloaded'
    MATCH (m)<-[:ACTED_IN]-(p)
    RETURN p.name as name
}
RETURN DISTINCT name
ORDER BY name ASC

This demonstrates both the ability to alias the imported variable as well as to filter the imported variable provided we use a second WITH clause, which is not restricted in the same way as the initial WITH used for the import into the subquery.

For 3.5.x and earlier

For earlier versions, native subqueries are not available, so other workarounds must be used.

Combine collections, then UNWIND back to rows and apply DISTINCT

MATCH (m:Movie)
WITH collect(m) AS movies
MATCH (p:Person)
WITH movies + collect(p) AS moviesAndPeople
UNWIND moviesAndPeople AS movieOrPerson
WITH DISTINCT movieOrPerson
...

DISTINCT isn’t really needed in the above query, but it will be needed if it’s possible for a result to be present in multiple collections being combined, provided you want distinct values.

Use apoc.cypher.run() to return UNION results from a subquery

Using APOC Procedures, you can use apoc.cypher.run() to execute a UNION within a subquery, and return its results.

CALL apoc.cypher.run('
 MATCH (movieOrPerson:Movie)
 RETURN movieOrPerson
 UNION
 MATCH (movieOrPerson:Person)
 RETURN movieOrPerson',
 {}) yield value
WITH value.movieOrPerson as movieOrPerson
...

Remember that procedure calls are executed per-row, so using this approach when multiple rows already exist may lead to unintended and unexpected results.