Writing to Neo4j

The connector provides three data source options to write data to a Neo4j database.

Table 1. Write options
Option Description Value Default

labels

Use this if you only need to create or update nodes with their properties, or as a first step before adding relationships.

Colon-separated list of node labels to create or update.

(empty)

relationship

Use this if you need to create or update relationships along with their source and target nodes.

Relationship type to create or update.

(empty)

query

Use this if you need more flexibility and know how to write a Cypher® query.

Cypher query with a CREATE or MERGE clause.

(empty)

Examples

All the examples in this page assume that the SparkSession has been initialized with the appropriate authentication options. See the Quickstart examples for more details.

You can run the read examples for each option to check the data after writing.

labels option

Write the :Person nodes.

Example
case class Person(name: String, surname: String, age: Int)

val peopleDF = List(
    Person("John", "Doe", 42),
    Person("Jane", "Doe", 40)
).toDF()

peopleDF.write
    .format("org.neo4j.spark.DataSource")
    .mode(SaveMode.Append)
    .option("labels", ":Person")
    .save()
Example
# Create example DataFrame
peopleDF = spark.createDataFrame(
    [
        {"name": "John", "surname": "Doe", "age": 42},
        {"name": "Jane", "surname": "Doe", "age": 40},
    ]
)

(
    peopleDF.write.format("org.neo4j.spark.DataSource")
    .mode("Append")
    .option("labels", ":Person")
    .save()
)

See Write nodes for more information and examples.

relationship option

Write the :BOUGHT relationship with its source and target nodes and its properties.

Example
val relDF = Seq(
    ("John", "Doe", 1, "Product 1", 200, "ABC100"),
    ("Jane", "Doe", 2, "Product 2", 100, "ABC200")
).toDF("name", "surname", "customerID", "product", "quantity", "order")

relDF.write
    // Create new relationships
    .mode("Append")
    .format("org.neo4j.spark.DataSource")
    // Assign a type to the relationships
    .option("relationship", "BOUGHT")
    // Use `keys` strategy
    .option("relationship.save.strategy", "keys")
    // Create source nodes and assign them a label
    .option("relationship.source.save.mode", "Append")
    .option("relationship.source.labels", ":Customer")
    // Map the DataFrame columns to node properties
    .option("relationship.source.node.properties", "name,surname,customerID:id")
    // Create target nodes and assign them a label
    .option("relationship.target.save.mode", "Append")
    .option("relationship.target.labels", ":Product")
    // Map the DataFrame columns to node properties
    .option("relationship.target.node.properties", "product:name")
    // Map the DataFrame columns to relationship properties
    .option("relationship.properties", "quantity,order")
    .save()
Example
# Create example DataFrame
relDF = spark.createDataFrame(
    [
        {
            "name": "John",
            "surname": "Doe",
            "customerID": 1,
            "product": "Product 1",
            "quantity": 200,
            "order": "ABC100",
        },
        {
            "name": "Jane",
            "surname": "Doe",
            "customerID": 2,
            "product": "Product 2",
            "quantity": 100,
            "order": "ABC200",
        },
    ]
)

(
    relDF.write
    # Create new relationships
    .mode("Append")
    .format("org.neo4j.spark.DataSource")
    # Assign a type to the relationships
    .option("relationship", "BOUGHT")
    # Use `keys` strategy
    .option("relationship.save.strategy", "keys")
    # Create source nodes and assign them a label
    .option("relationship.source.save.mode", "Append")
    .option("relationship.source.labels", ":Customer")
    # Map the DataFrame columns to node properties
    .option("relationship.source.node.properties", "name,surname,customerID:id")
    # Create target nodes and assign them a label
    .option("relationship.target.save.mode", "Append")
    .option("relationship.target.labels", ":Product")
    # Map the DataFrame columns to node properties
    .option("relationship.target.node.properties", "product:name")
    # Map the DataFrame columns to relationship properties
    .option("relationship.properties", "quantity,order")
    .save()
)

See Write relationships for more information and examples.

query option

Use a Cypher query to write data.

Example
case class Person(name: String, surname: String, age: Int)

// Create an example DataFrame
val queryDF = List(
    Person("John", "Doe", 42),
    Person("Jane", "Doe", 40)
).toDF()

// Define the Cypher query to use in the write
val writeQuery =
    "CREATE (n:Person {fullName: event.name + ' ' + event.surname})"

queryDF.write
    .format("org.neo4j.spark.DataSource")
    .option("query", writeQuery)
    .mode(SaveMode.Overwrite)
    .save()
Example
# Create example DataFrame
queryDF = spark.createDataFrame(
    [
        {"name": "John", "surname": "Doe", "age": 42},
        {"name": "Jane", "surname": "Doe", "age": 40},
    ]
)

# Define the Cypher query to use in the write
write_query = "CREATE (n:Person {fullName: event.name + ' ' + event.surname})"

(
    queryDF.write.format("org.neo4j.spark.DataSource")
    .option("query", write_query)
    .mode("Overwrite")
    .save()
)

See Write with a Cypher query for more information and examples.

Save mode

Regardless of the write option, the connector supports two save modes for the data source mode() method:

  • The Append mode creates new nodes or relationships by building a CREATE Cypher query.

  • The Overwrite mode creates or updates new nodes or relationships by building a MERGE Cypher query.

    • Requires the node.keys option when used with the labels option.

    • Requires the relationship.source.node.keys and relationship.target.node.keys when used with the relationship option.

Type mapping

See Data type mapping for the full type mapping between Spark DataFrames and Neo4j.

Performance considerations

Since writing is typically an expensive operation, make sure you write only the DataFrame columns you need.

For example, if the columns from the data source are name, surname, age, and livesIn, but you only need name and surname, you can do the following:

df.select(df("name"), df("surname"))
  .write
  .format("org.neo4j.spark.DataSource")
  .mode(SaveMode.Append)
  .option("labels", ":Person")
  .save()