Read with a Cypher query
All the examples in this page assume that the |
If you need more flexibility, you can use the query
option to run a custom Cypher® query.
The query must include a RETURN
clause.
With stored procedures, you need to include a YIELD
clause before RETURN
.
query
option exampleval query = """
MATCH (n:Person)
WITH n
LIMIT 2
RETURN id(n) AS id, n.name AS name
"""
spark.read.format("org.neo4j.spark.DataSource")
.option("query", query)
.load()
.show()
id | name |
---|---|
0 |
John Doe |
1 |
Jane Doe |
DataFrame columns
The structure of the result DataFrame is defined by the query itself. See the Schema inference page for more details.
This option is best suited when you return individual properties rather than graph entities (nodes, relationships, paths). Returning a graph entity, anyway, does not cause an error.
Recommended | Not recommended |
---|---|
MATCH (p:Person) RETURN id(p) AS id, p.name AS name |
MATCH (p:Person) RETURN p |
If you need to return a graph entity, use the |
Limit the results
The connector uses SKIP
and LIMIT
internally to support partitioned reads; as a result, SKIP
and LIMIT
clauses are not allowed in a custom Cypher query.
Attempts to do this will cause execution errors.
A possible workaround is to use SKIP
and LIMIT
before the RETURN
clause.
For example, the following query fails:
MATCH (p:Person)
RETURN p.name AS name
ORDER BY name
LIMIT 10
The query can be rewritten with LIMIT
before the RETURN
to complete successfully:
MATCH (p:Person)
WITH p.name AS name
ORDER BY name
LIMIT 10
RETURN p.name
When you rewrite a query, make sure the new query is equivalent to your original query so that the result is the same. |
You can also use the query.count
option instead of rewriting your query.
See the Spark optimizations page for more details.
The script
option
The script
option allows to run a sequence of Cypher queries before executing the read operation.
The result of the script
can be used in a subsequent query
, for example to inject query parameters.
script
and query
exampleval script = "RETURN 'foo' AS val"
val query = """
UNWIND range(1, 2) as id
RETURN id AS val, scriptResult[0].val AS script
"""
spark.read.format("org.neo4j.spark.DataSource")
.option("script", script)
.option("query", query)
.load()
.show()
val | script |
---|---|
1 |
foo |
2 |
foo |