FAQ and troubleshooting
General questions
Is this software connected to Morpheus or Cypher for Apache Spark (CAPS)?
No. There is no shared code or approach between the two, and they take very different approaches. Cypher® for Apache Spark/Morpheus took the approach of providing an interpreter that could execute Cypher queries within the Spark environment, and provided a native graph representation for Spark. By contrast, this connector does not provide that functionality and focuses on doing reads and writes back and forth between Neo4j and Spark. Via this connector, all Cypher code is executed strictly within Neo4j. The Spark environment operates in terms of DataFrames as it always did, and this connector does not provide graph API primitives for Spark.
How can I speed up writes to Neo4j?
The Spark connector fundamentally writes data to Neo4j in batches. Neo4j is a transactional database, and all modifications are made within a transaction. Those transactions in turn have overhead.
The two simplest ways of increasing write performance are the following:
-
You can increase the batch size (option
batch.size
). The larger the batch, the fewer transactions are executed to write all of your data, and the less transactional overhead is incurred. -
Ensure that your Neo4j instance has ample free heap and properly sized page cache. Small heaps make you unable to commit large batches, which in turn slows overall import.
For best performance, make sure you are familiar with the material in the Operations manual. |
It is important to keep in mind that Neo4j scales writes vertically and reads horizontally. In the Causal Cluster Model, only the cluster leader (1 machine) may accept writes. For this reason, focus on getting the best hardware and performance on your cluster leader to maximize write throughput.
Why are the returned columns not in the same order as I specified in the query?
Unfortunately, this is a known issue and is there for Neo4j 3.x and Neo4j 4.0. From Neo4j 4.1 onwards, you get the same order as it’s specified in the return statement.
Where can I get help?
The Neo4j Community site is a great place to ask questions, talk with other users who use the connector, and get help from Neo4j experts. You can also write in the Neo4j Discord channel.
Errors
My writes are failing due to Deadlock Exceptions. What should I do?
In some cases, Neo4j rejects write transactions due to a deadlock exception that you may see in the stacktrace.
This Neo4j Knowledge Base article describes the issue.
Typically this is caused by too much parallelism in writing to Neo4j.
For example, when you write a relationship (:A)-[:REL]→(:B)
, this creates a 'lock' in the database on both nodes.
If some simultaneous other thread is attempting to write to those nodes too often, deadlock
exceptions can result and a transaction fails.
In general, the solution is to repartition the DataFrame before writing it to Neo4j, to avoid multiple partitioned writes from locking the same nodes and relationships.
I am getting a cast error like UTF8String cannot be cast to Long. What is the solution?
You might be getting an error like the following one or similar, with different types:
java.lang.ClassCastException: org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.Long
This is typically due to a field having different types on the same nodes label. You can solve it by installing APOC: this removes the error, but all the values for that field are cast to String. This happens because Spark is not schema-free and needs each column always to have its own type.
You can read more here.
I am getting an error TableProvider implementation org.neo4j.spark.DataSource cannot be written with ErrorIfExists mode, please use Append or Overwrite modes instead. What should I do?
If you are getting this error while trying to write to Neo4j, be aware that the current version of the connector
doesn’t support SaveMode.ErrorIfExists
on Spark 3,
and that is the default save mode.
So please, change the save mode to one of SaveMode.Append
or SaveMode.Overwrite
.
We are working to fully support all the save modes on Spark 3.
I am getting errors NoClassDefFoundError or ClassNotFoundException. What should I do?
You may get one of the following types of error:
NoClassDefFoundError: org/apache/spark/sql/sources/v2/ReadSupport Caused by: ClassNotFoundException: org.apache.spark.sql.sources.v2.ReadSupport
java.lang.NoClassDefFoundError: scala/collection/IterableOnce Caused by: java.lang.ClassNotFoundException: scala.collection.IterableOnce
This means that your Spark version doesn’t match the Spark version on the connector. Refer to this page to know which version you need.
I am getting an error Failed to invoke procedure gds.graph.create.cypher: Caused by: java.lang.IllegalArgumentException: A graph with name [name] already exists. What should I do?
This might happen when creating a new graph using the GDS library. The issue here is that the query is run the first time to extract the DataFrame schema and then is run again to get the data.
To avoid this issue you can use the user defined schema approach.
Databricks setup
I am getting an SSL handshake error while connecting to Aura.
Aura customers connecting from Databricks may encounter SSL handshake errors due to Databricks' custom Java security settings removing support for some encryption ciphers.
See the Aura support article Connecting to Aura with Databricks for more information.