Read relationships
All the examples in this page assume that the |
You can read a relationship and its source and target nodes by specifying the relationship type, the source node labels, and the target node labels.
val df = spark.read
.format("org.neo4j.spark.DataSource")
.option("relationship", "BOUGHT")
.option("relationship.source.labels", ":Customer")
.option("relationship.target.labels", ":Product")
.load()
df.show()
df = (
spark.read.format("org.neo4j.spark.DataSource")
.option("relationship", "BOUGHT")
.option("relationship.source.labels", ":Customer")
.option("relationship.target.labels", ":Product")
.load()
)
df.show()
Equivalent Cypher query
MATCH (source:Customer)
MATCH (target:Product)
MATCH (source)-[rel:BOUGHT]->(target)
RETURN ...
The exact RETURN
clause depends on the value of the relationship.nodes.map
option.
DataFrame columns
When reading data with this method, the DataFrame contains the following columns:
-
<rel.id>
: internal Neo4j ID -
<rel.type>
: relationship type -
rel.[property name]
: relationship properties
Additional columns are added depending on the value of the relationship.nodes.map
option:
relationship.nodes.map set to false (default) |
relationship.nodes.map set to true |
---|---|
|
|
relationship.nodes.map
set to false
val df = spark.read
.format("org.neo4j.spark.DataSource")
.option("relationship", "BOUGHT")
// It can be omitted, since `false` is the default
.option("relationship.nodes.map", "false")
.option("relationship.source.labels", ":Customer")
.option("relationship.target.labels", ":Product")
.load()
df.show()
df = (
spark.read.format("org.neo4j.spark.DataSource")
.option("relationship", "BOUGHT")
# It can be omitted, since `false` is the default
.option("relationship.nodes.map", "false")
.option("relationship.source.labels", ":Customer")
.option("relationship.target.labels", ":Product")
.load()
)
df.show()
<rel.id> | <rel.type> | <source.id> | <source.labels> | source.surname | source.name | source.id | <target.id> | <target.labels> | target.name | rel.order | rel.quantity |
---|---|---|---|---|---|---|---|---|---|---|---|
3189 |
BOUGHT |
1100 |
[Customer] |
Doe |
John |
1 |
1040 |
[Product] |
Product1 |
ABC100 |
200 |
3190 |
BOUGHT |
1099 |
[Customer] |
Doe |
Jane |
2 |
1039 |
[Product] |
Product1 |
ABC200 |
100 |
relationship.nodes.map
set to true
val df = spark.read
.format("org.neo4j.spark.DataSource")
.option("relationship", "BOUGHT")
.option("relationship.nodes.map", "true")
.option("relationship.source.labels", ":Customer")
.option("relationship.target.labels", ":Product")
.load()
// Use `false` to print the whole DataFrame
df.show(false)
df = (
spark.read.format("org.neo4j.spark.DataSource")
.option("relationship", "BOUGHT")
.option("relationship.nodes.map", "true")
.option("relationship.source.labels", ":Customer")
.option("relationship.target.labels", ":Product")
.load()
)
# Use `false` to print the whole DataFrame
df.show(truncate=False)
<rel.id> | <rel.type> | <source> | <target> | rel.order | rel.quantity |
---|---|---|---|---|---|
3189 |
BOUGHT |
{surname: "Doe", name: "John", id: 1, <labels>: ["Customer"], <id>: 1100} |
{name: "Product 1", <labels>: ["Product"], <id>: 1040} |
ABC100 |
200 |
3190 |
BOUGHT |
{surname: "Doe", name: "Jane", id: 2, <labels>: ["Customer"], <id>: 1099} |
{name: "Product 2", <labels>: ["Product"], <id>: 1039} |
ABC200 |
100 |
The schema for the node and relationship property columns is inferred as explained in Schema inference.
Filtering
You can use the where
and filter
functions in Spark to filter properties of the relationship, the source node, or the target node.
The correct format of the filter depends on the value of relationship.nodes.map
option.
relationship.nodes.map set to false (default) |
relationship.nodes.map set to true |
---|---|
|
|
Examples:
relationship.nodes.map
set to false
val df = spark.read
.format("org.neo4j.spark.DataSource")
.option("relationship", "BOUGHT")
.option("relationship.nodes.map", "false")
.option("relationship.source.labels", ":Customer")
.option("relationship.target.labels", ":Product")
.load()
df.where("`source.id` > 1").show()
df = (
spark.read.format("org.neo4j.spark.DataSource")
.option("relationship", "BOUGHT")
.option("relationship.nodes.map", "false")
.option("relationship.source.labels", ":Customer")
.option("relationship.target.labels", ":Product")
.load()
)
df.where("`source.id` > 1").show()
<rel.id> | <rel.type> | <source.id> | <source.labels> | source.surname | source.name | <target.id> | <target.labels> | target.name | rel.order | rel.quantity |
---|---|---|---|---|---|---|---|---|---|---|
3190 |
BOUGHT |
1099 |
[Customer] |
Doe |
Jane |
2 |
1039 |
[Product] |
Product 2 |
ABC200 |
relationship.nodes.map
set to true
val df = spark.read
.format("org.neo4j.spark.DataSource")
.option("relationship", "BOUGHT")
.option("relationship.nodes.map", "true")
.option("relationship.source.labels", ":Customer")
.option("relationship.target.labels", ":Product")
.load()
// Use `false` to print the whole DataFrame
df.where("`<source>`.`id` > 1").show(false)
df = (
spark.read.format("org.neo4j.spark.DataSource")
.option("relationship", "BOUGHT")
.option("relationship.nodes.map", "true")
.option("relationship.source.labels", ":Customer")
.option("relationship.target.labels", ":Product")
.load()
)
# Use `false` to print the whole DataFrame
df.where("`<source>`.`id` > 1").show(truncate=False)
<rel.id> | <rel.type> | <source> | <target> | rel.order | rel.quantity |
---|---|---|---|---|---|
3190 |
BOUGHT |
{surname: "Doe", name: "Jane", id: 2, <labels>: ["Customer"], <id>: 1099} |
{name: "Product 2", <labels>: ["Product"], <id>: 1039} |
ABC200 |
100 |