Usage with a Neo4j cluster

The Kafka Connect Neo4j Connector is the recommended method to integrate Kafka with Neo4j, as Neo4j Streams is no longer under active development and will not be supported after version 4.4 of Neo4j.

The most recent version of the Kafka Connect Neo4j Connector can be found here.

Overview

Neo4j Clustering is a feature available in Enterprise Edition which allows high availability of the database.

Neo4j Plugin

When the Neo4j Streams plugin is used with a Neo4j cluster, there are several things to keep in mind:

  • The plugin must be present in the plugins directory of all cluster members, and not just one.

  • The configuration settings must be present in:

    • all neo4j.conf files for versions prior of 4.0.7, and not just one;

    • all streams.conf files for versions since 4.0.7, and not just one.

Through the course of the cluster lifecycle, the leader may change; for this reason the plugin must be everywhere, and not just on the leader.

The plugin detects the leader, and will not attempt to perform a write (i.e. in the case of the consumer) on a follower where it would fail. The plugin checks cluster toplogy as needed.

Additionally for CDC, a consideration to keep in mind is that as of Neo4j 3.5, committed transactions are only published on the leader as well. In practical terms, this means that as new data is committed to Neo4j, it is the leader that will be publishing that data back out to Kafka, if you have the producer configured.

The neo4j streams utility procedures, in particular CALL streams.publish, can work on any cluster member, or read replica. CALL streams.consume may also be used on any cluster member, however it is important to keep in mind that due to the way clustering in Neo4j works, using streams.consume together with write operations will not work on a cluster follower or read replica, as only the leader can process writes.

Remote Clients

Sometimes there will be remote applications that talk to Neo4j via official drivers, that want to use streams functionality. Best practices in these cases are:

  • Always use a neo4j:// driver URI when communicating with the cluster in the client application.

  • Use Explicit Write Transactions in your client application when using procedure calls such as CALL streams.consume to ensure that the routing driver routes the query to the leader.