Writing to Neo4j
The connector provides three data source options to write data to a Neo4j database.
Option | Description | Value | Default |
---|---|---|---|
|
Use this if you only need to create or update nodes with their properties, or as a first step before adding relationships. |
Colon-separated list of node labels to create or update. |
(empty) |
|
Use this if you need to create or update relationships along with their source and target nodes. |
Relationship type to create or update. |
(empty) |
|
Use this if you need more flexibility and know how to write a Cypher® query. |
Cypher query with a |
(empty) |
Examples
All the examples in this page assume that the |
You can run the read examples for each option to check the data after writing. |
labels
option
Write the :Person
nodes.
case class Person(name: String, surname: String, age: Int)
val peopleDF = List(
Person("John", "Doe", 42),
Person("Jane", "Doe", 40)
).toDF()
peopleDF.write
.format("org.neo4j.spark.DataSource")
.mode(SaveMode.Append)
.option("labels", ":Person")
.save()
# Create example DataFrame
peopleDF = spark.createDataFrame(
[
{"name": "John", "surname": "Doe", "age": 42},
{"name": "Jane", "surname": "Doe", "age": 40},
]
)
(
peopleDF.write.format("org.neo4j.spark.DataSource")
.mode("Append")
.option("labels", ":Person")
.save()
)
See Write nodes for more information and examples.
relationship
option
Write the :BOUGHT
relationship with its source and target nodes and its properties.
val relDF = Seq(
("John", "Doe", 1, "Product 1", 200, "ABC100"),
("Jane", "Doe", 2, "Product 2", 100, "ABC200")
).toDF("name", "surname", "customerID", "product", "quantity", "order")
relDF.write
// Create new relationships
.mode("Append")
.format("org.neo4j.spark.DataSource")
// Assign a type to the relationships
.option("relationship", "BOUGHT")
// Use `keys` strategy
.option("relationship.save.strategy", "keys")
// Create source nodes and assign them a label
.option("relationship.source.save.mode", "Append")
.option("relationship.source.labels", ":Customer")
// Map the DataFrame columns to node properties
.option("relationship.source.node.properties", "name,surname,customerID:id")
// Create target nodes and assign them a label
.option("relationship.target.save.mode", "Append")
.option("relationship.target.labels", ":Product")
// Map the DataFrame columns to node properties
.option("relationship.target.node.properties", "product:name")
// Map the DataFrame columns to relationship properties
.option("relationship.properties", "quantity,order")
.save()
# Create example DataFrame
relDF = spark.createDataFrame(
[
{
"name": "John",
"surname": "Doe",
"customerID": 1,
"product": "Product 1",
"quantity": 200,
"order": "ABC100",
},
{
"name": "Jane",
"surname": "Doe",
"customerID": 2,
"product": "Product 2",
"quantity": 100,
"order": "ABC200",
},
]
)
(
relDF.write
# Create new relationships
.mode("Append")
.format("org.neo4j.spark.DataSource")
# Assign a type to the relationships
.option("relationship", "BOUGHT")
# Use `keys` strategy
.option("relationship.save.strategy", "keys")
# Create source nodes and assign them a label
.option("relationship.source.save.mode", "Append")
.option("relationship.source.labels", ":Customer")
# Map the DataFrame columns to node properties
.option("relationship.source.node.properties", "name,surname,customerID:id")
# Create target nodes and assign them a label
.option("relationship.target.save.mode", "Append")
.option("relationship.target.labels", ":Product")
# Map the DataFrame columns to node properties
.option("relationship.target.node.properties", "product:name")
# Map the DataFrame columns to relationship properties
.option("relationship.properties", "quantity,order")
.save()
)
See Write relationships for more information and examples.
query
option
Use a Cypher query to write data.
case class Person(name: String, surname: String, age: Int)
// Create an example DataFrame
val queryDF = List(
Person("John", "Doe", 42),
Person("Jane", "Doe", 40)
).toDF()
// Define the Cypher query to use in the write
val writeQuery =
"CREATE (n:Person {fullName: event.name + ' ' + event.surname})"
queryDF.write
.format("org.neo4j.spark.DataSource")
.option("query", writeQuery)
.mode(SaveMode.Overwrite)
.save()
# Create example DataFrame
queryDF = spark.createDataFrame(
[
{"name": "John", "surname": "Doe", "age": 42},
{"name": "Jane", "surname": "Doe", "age": 40},
]
)
# Define the Cypher query to use in the write
write_query = "CREATE (n:Person {fullName: event.name + ' ' + event.surname})"
(
queryDF.write.format("org.neo4j.spark.DataSource")
.option("query", write_query)
.mode("Overwrite")
.save()
)
See Write with a Cypher query for more information and examples.
Save mode
Regardless of the write option, the connector supports two save modes for the data source mode()
method:
-
The
Append
mode creates new nodes or relationships by building aCREATE
Cypher query. -
The
Overwrite
mode creates or updates new nodes or relationships by building aMERGE
Cypher query.-
Requires the
node.keys
option when used with thelabels
option. -
Requires the
relationship.source.node.keys
andrelationship.target.node.keys
when used with therelationship
option.
-
Type mapping
See Data type mapping for the full type mapping between Spark DataFrames and Neo4j.
Performance considerations
Since writing is typically an expensive operation, make sure you write only the DataFrame columns you need.
For example, if the columns from the data source are name
, surname
, age
, and livesIn
, but you only need name
and surname
, you can do the following:
df.select(df("name"), df("surname"))
.write
.format("org.neo4j.spark.DataSource")
.mode(SaveMode.Append)
.option("labels", ":Person")
.save()