Installation

Choose the correct version

Make sure to match both the Spark version and the Scala version of your setup. Here is a compatibility table to help you choose the correct version of the connector.

Table 1. Compatibility table
Spark Version Artifact (Scala 2.12) Artifact (Scala 2.13)

3.4+

org.neo4j:neo4j-connector-apache-spark_2.12:5.3.2_for_spark_3

org.neo4j:neo4j-connector-apache-spark_2.13:5.3.2_for_spark_3

3.3

org.neo4j:neo4j-connector-apache-spark_2.12:5.1.0_for_spark_3

org.neo4j:neo4j-connector-apache-spark_2.13:5.1.0_for_spark_3

3.2

org.neo4j:neo4j-connector-apache-spark_2.12:5.0.3_for_spark_3

org.neo4j:neo4j-connector-apache-spark_2.13:5.0.3_for_spark_3

3.0 and 3.1

org.neo4j:neo4j-connector-apache-spark_2.12:4.1.5_for_spark_3

org.neo4j:neo4j-connector-apache-spark_2.13:4.1.5_for_spark_3

Usage with the Spark shell

The connector is available via Spark Packages:

$SPARK_HOME/bin/spark-shell --packages org.neo4j:neo4j-connector-apache-spark_2.12:5.3.2_for_spark_3
$SPARK_HOME/bin/pyspark --packages org.neo4j:neo4j-connector-apache-spark_2.12:5.3.2_for_spark_3

Alternatively, you can download the connector JAR file from the Neo4j Connector Page or from the GitHub releases page and run the following command to launch a Spark interactive shell with the connector included:

$SPARK_HOME/bin/spark-shell --jars neo4j-connector-apache-spark_2.12-5.3.2_for_spark_3.jar
$SPARK_HOME/bin/pyspark --jars neo4j-connector-apache-spark_2.12-5.3.2_for_spark_3.jar

Self-contained applications

For non-Python applications:

  1. Include the connector in your application using the application’s build tool.

  2. Package the application.

  3. Use spark-submit to run the application.

For Python applications, run spark-submit directly.

As for the spark-shell, you can run spark-submit via Spark Packages or with a local JAR file. See the Quickstart for code examples.

A minimal build.sbt
name := "Spark App"
version := "1.0"
scalaVersion := "2.12.18"

libraryDependencies += "org.apache.spark" %% "spark-sql" % "3.5.1"
libraryDependencies += "org.neo4j" %% "neo4j-connector-apache-spark" % "5.3.1_for_spark_3"

If you use the sbt-spark-package plugin, add the following to your build.sbt instead:

scala spDependencies += "org.neo4j/neo4j-connector-apache-spark_2.12:5.3.2_for_spark_3"
A minimal pom.xml
<project>
  <groupId>org.neo4j</groupId>
  <artifactId>spark-app</artifactId>
  <modelVersion>4.0.0</modelVersion>
  <name>Spark App</name>
  <packaging>jar</packaging>
  <version>1.0</version>
  <dependencies>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.12</artifactId>
      <version>3.5.1</version>
      <scope>provided</scope>
    </dependency>
    <dependency>
      <groupId>org.neo4j</groupId>
      <artifactId>neo4j-connector-apache-spark_2.12</artifactId>
      <version>5.3.1_for_spark_3</version>
    </dependency>
  </dependencies>
</project>

Other build tools

Gradle

dependencies {
    // list of dependencies
    compile "org.neo4j:neo4j-connector-apache-spark_2.12:5.3.2_for_spark_3"
}