Cluster server discovery

In order to join a running cluster, any new member must know the addresses of at least some of the other servers in the cluster. This information is necessary to connect to the servers, run the discovery protocol, and obtain all the information about the cluster.

Neo4j provides several mechanisms for cluster members to discover each other and form a cluster based on the configuration and the environment in which the cluster is running, as well as the version of Neo4j being used.

In Neo4j 5.23, the discovery service v2 has been introduced. The new version, v2, is recommended for new deployments.

The current version, v1, has been deprecated and will be removed in a future release. Therefore, it is highly recommended to move from v1 to v2. For details, see Moving from discovery service v1 to v2.

The new version v2 introduces the following configuration settings to control the discovery service:

The following sections describe the different methods for server discovery, configuration settings, and examples of how to move from discovery service v1 to v2.

Methods for server discovery

Depending on the type of dbms.cluster.discovery.resolver_type currently in use, the discovery service can use a list of server addresses, DNS records, or Kubernetes services to discover other servers in the cluster. The discovery configuration is used for initial discovery and to continuously exchange information about changes to the topology of the cluster.

Discovery using a list of server addresses

If the addresses of the other cluster members are known upfront, they can be listed explicitly. However, this method has limitations, such as:

  • If servers are replaced and the new members have different addresses, the list becomes outdated. An outdated list can be avoided by ensuring that the new members can be reached via the same address as the old members, but this is not always practical.

  • Under some circumstances the addresses are unknown when configuring the cluster. This can be the case, for example, when using container orchestration to deploy a cluster.

To use this method, set dbms.cluster.discovery.resolver_type=LIST and hard code the addresses in the configuration of each server. For example:

dbms.cluster.discovery.resolver_type=LIST

server.cluster.advertised_address=server01.example.com:6000
dbms.cluster.discovery.v2.endpoints=server01.example.com:6000,server02.example.com:6000,server03.example.com:6000

dbms.cluster.discovery.version=V2_ONLY
dbms.cluster.discovery.resolver_type=LIST

server.discovery.advertised_address=server01.example.com:5000
dbms.cluster.discovery.endpoints=server01.example.com:5000,server02.example.com:5000,server03.example.com:5000

dbms.cluster.discovery.version=V1_ONLY

An example of using this method with discovery service v1 is illustrated by Configure a cluster with three servers.

Discovery using DNS with multiple records

Where it is not practical or possible to explicitly list the addresses of cluster members to discover, you can use DNS-based mechanisms. In such cases, a DNS record lookup is performed when a server starts up based on configuration settings. Once a server has joined a cluster, further topology changes are communicated amongst the servers in the cluster as part of the discovery service.

The following DNS-based mechanisms can be used to get the addresses of other servers in the cluster for discovery:

dbms.cluster.discovery.resolver_type=DNS

With this configuration, the initial discovery members are resolved from DNS A records to find the IP addresses to contact. In version 1, the value of dbms.cluster.discovery.endpoints should be set to a single domain name and the port of the discovery service, while in version 2, the value of dbms.cluster.discovery.v2.endpoints should be set to a single domain name and the cluster advertised port. For example:

dbms.cluster.discovery.resolver_type=DNS

server.cluster.advertised_address=server01.example.com:6000
dbms.cluster.discovery.v2.endpoints=cluster01.example.com:6000

dbms.cluster.discovery.version=V2_ONLY
dbms.cluster.discovery.resolver_type=DNS

server.discovery.advertised_address=server01.example.com:5000
dbms.cluster.discovery.endpoints=cluster01.example.com:5000

dbms.cluster.discovery.version=V1_ONLY

When a DNS lookup is performed, the domain name returns an A record for every server in the cluster, where each A record contains the IP address of the server. The configured server uses all the IP addresses from the A records to join or form a cluster.

The discovery port must be the same on all servers when using this configuration. If this is not possible, consider using the discovery type SRV.

dbms.cluster.discovery.resolver_type=SRV

With this configuration, the initial discovery members are resolved from DNS SRV records to find the IP addresses/hostnames and discovery service ports (v1) or cluster advertised ports (v2) to contact.

The value of dbms.cluster.discovery.endpoints must be set to a single domain name and the port set to 0. The domain name returns a single SRV record when a DNS lookup is performed. For example:

dbms.cluster.discovery.resolver_type=SRV

server.cluster.advertised_address=server01.example.com:6000
dbms.cluster.discovery.v2.endpoints=cluster01.example.com:0

dbms.cluster.discovery.version=V2_ONLY

The SRV record returned by DNS should contain the IP address or hostname, and the cluster port for the servers to be discovered. The configured server uses all the addresses from the SRV record to join or form a cluster.

dbms.cluster.discovery.resolver_type=SRV

server.discovery.advertised_address=server01.example.com:5000
dbms.cluster.discovery.endpoints=cluster01.example.com:0

dbms.cluster.discovery.version=V1_ONLY

The SRV record returned by DNS should contain the IP address or hostname, and the discovery port for the servers to be discovered. The configured server uses all the addresses from the SRV record to join or form a cluster.

Discovery in Kubernetes

A special case is when a cluster is running in Kubernetes and each server is running as a Kubernetes service. Then, the addresses of the other servers can be obtained using the List Service API, as described in the Kubernetes API documentation.

The following settings are used to configure for this scenario:

With this configuration, dbms.cluster.discovery.endpoints is not used and any value assigned to it is ignored.

  • The pod running Neo4j must use a service account that has permission to list services. For further information, see the Kubernetes documentation on RBAC authorization or ABAC authorization.

  • The configured server.discovery.advertised_address must exactly match the Kubernetes-internal DNS name, which is of the form <service-name>.<namespace>.svc.cluster.local.

The discovery configuration is used for initial discovery and to continuously exchange information about changes to the topology of the cluster.

Moving from discovery service v1 to v2

From Neo4j 5.23, you can move from the current discovery service v1 to the new version v2. The v1 and v2 discovery services are designed to be able to run in parallel. They are completely independent of each other, thus allowing you to keep the cluster functioning while switching over from v1 to v2.

There are three ways to move from the current discovery service v1 to the new version v2 depending on the environment and the requirements of the cluster.

Preparation

The following examples assume that you have a running cluster with three servers and you want to move from the current discovery service v1 to the new version v2.

v1 only

Before moving from the current discovery service v1 to the new version v2, ensure that the new settings are added to the configuration depending on the type of dbms.cluster.discovery.resolver_type in use:

  • If dbms.cluster.discovery.resolver_type=LIST, set dbms.cluster.discovery.v2.endpoints to a comma-separated list of server.cluster.advertised_address. It is important that both dbms.cluster.discovery.endpoints and dbms.cluster.discovery.v2.endpoints are set during the operation. For more information, see Discovery using a list of server addresses.

  • If dbms.cluster.discovery.resolver_type=DNS, set dbms.cluster.discovery.v2.endpoints to a single domain name and the cluster port. Alternatively, if dbms.cluster.discovery.resolver_type=SRV, set dbms.cluster.discovery.v2.endpoints to a single domain name and the port set to 0. It is important that both dbms.cluster.discovery.endpoints and dbms.cluster.discovery.v2.endpoints are set during the operation. For more information, see Discovery using DNS with multiple records.

  • If dbms.cluster.discovery.resolver_type=K8S, set dbms.kubernetes.discovery.v2.service_port_name to the name of the service port used in the Kubernetes service definition for the cluster port. It is important that both dbms.kubernetes.service_port_name and dbms.kubernetes.discovery.v2.service_port_name are set during the operation. For more information, see Discovery in Kubernetes.

In-place rolling

In-place rolling reduces fault tolerance temporarily because you are restarting a running server. To keep fault-tolerance, you can introduce a fourth server temporarily.

  1. For all the servers, ensure that new settings are added to the configuration as detailed the Preparation section.

    As an example, for those using the list resolver, the settings for all the servers should include:

    dbms.cluster.discovery.resolver_type=LIST
    
    dbms.cluster.discovery.endpoints=server01.example.com:5000,server02.example.com:5000,server03.example.com:5000
    dbms.cluster.discovery.v2.endpoints=server01.example.com:6000,server02.example.com:6000,server03.example.com:6000
  2. Restart server01 with the new setting dbms.cluster.discovery.version=V1_OVER_V2.

    in place 1 v1 over v2

  3. Run SHOW SERVERS and ensure that all three members are alive.

  4. Then repeat steps 2 and 3 for server02 and server03. Ensure that they are set to dbms.cluster.discovery.version=V1_OVER_V2. Restart them sequentially, not in parallel.

    in place 23 v1 over v2

  5. Using bolt://, connect to the system database of servers 1, 2, 3, and run the following procedure. This can be done using via ./cypher-shell -a bolt://localhost:7681 -d system for example.

    CALL dbms.cluster.showParallelDiscoveryState();

    They should display "Matching" in the stateComparison column.

    Expected result
    +----------------------------------------------------------------+
    | mode         | stateComparison | v1ServerCount | v2ServerCount |
    +----------------------------------------------------------------+
    | "V1_OVER_V2" | "Matching"      | "3"           | "3"           |
    +----------------------------------------------------------------+

    If they are not, wait and try again.

  6. Restart server01 again with the new setting dbms.cluster.discovery.version=V2_OVER_V1.

    in place 1 v2 over v1

  7. Run SHOW SERVERS and ensure that all three members are alive.

  8. Then repeat steps 6 and 7 for servers 2 and 3. Ensure that they are set to dbms.cluster.discovery.version=V2_OVER_V1. Restart them sequentially, not in parallel.

    in place 23 v2 over v1

  9. Similar to step 5, verify that stateComparison shows Matching.

  10. Repeat steps 6, 7, 8, 9, restarting servers 1, 2, and 3 sequentially, with the new setting dbms.cluster.discovery.version=V2_ONLY in place 123 v2 only

  11. Verify that CALL dbms.cluster.showParallelDiscoveryState() now shows V2_ONLY running. Note that stateComparison is N/A because you do not have v1 to compare states anymore.

New server rolling

The new server rolling requires three running servers and three new servers.

  1. Start up the three new servers with the setting dbms.cluster.discovery.version=V1_OVER_V2.

    v1 over v2

    The new servers should have settings which are updated as detailed in the Preparation section. The discovery addresses should include addresses of the new members, and the previous members.

    As an example, for those using the list resolver, the settings for the new servers should include:

    dbms.cluster.discovery.resolver_type=LIST
    dbms.cluster.discovery.version=V1_OVER_V2
    
    dbms.cluster.discovery.endpoints=server01.example.com:5000,server02.example.com:5000,server03.example.com:5000,server04.example.com:5000,server05.example.com:5000,server06.example.com:5000
    dbms.cluster.discovery.v2.endpoints=server01.example.com:6000,server02.example.com:6000,server03.example.com:6000,server04.example.com:6000,server05.example.com:6000,server06.example.com:6000
  2. Using bolt://, connect to the system database of servers 4, 5, 6, and run the following procedure. This can be done using via ./cypher-shell -a bolt://localhost:7685 -d system for example.

    CALL dbms.cluster.showParallelDiscoveryState();

    The expected result should display v2ServerCount as 3. stateComparison is not expected to match at this stage because the original servers are not visible to the V2 discovery service.

    Expected result
     +---------------------------------------------------------------------------------------------------------+
     | mode         | stateComparison                                          | v1ServerCount | v2ServerCount |
     +---------------------------------------------------------------------------------------------------------+
     | "V1_OVER_V2" | "States are not matching after PT55M36.693S: (score:29)" | "6"           | "3"           |
     +---------------------------------------------------------------------------------------------------------+
  3. Deallocate, drop, and shut down servers 1, 2, 3.

  4. Start up servers 7, 8, 9, this time with the setting dbms.cluster.discovery.version=V2_OVER_V1.

    v2 over v1

    The discovery addresses in the settings should include addresses of the new members, and the previous members.

    As an example, for those using the list resolver, the settings for the new servers should include:

    dbms.cluster.discovery.resolver_type=LIST
    dbms.cluster.discovery.version=V2_OVER_V1
    
    dbms.cluster.discovery.endpoints=server04.example.com:5000,server05.example.com:5000,server06.example.com:5000,server07.example.com:5000,server08.example.com:5000,server09.example.com:5000
    dbms.cluster.discovery.v2.endpoints=server04.example.com:6000,server05.example.com:6000,server06.example.com:6000,server07.example.com:6000,server08.example.com:6000,server09.example.com:6000
  5. Using bolt://, connect to the system database of servers 7, 8, 9 and run the following procedure:

    CALL dbms.cluster.showParallelDiscoveryState();

    The output should display Matching in the stateComparison column. If they are not, wait and try again till matching.

    Expected result
    +----------------------------------------------------------------+
    | mode         | stateComparison | v1ServerCount | v2ServerCount |
    +----------------------------------------------------------------+
    | "V2_OVER_V1" | "Matching"      | "6"           | "6"           |
    +----------------------------------------------------------------+
  6. Deallocate, drop, and shut down servers 4, 5, and 6.

  7. Start up servers 10, 11, 12, this time with the setting dbms.cluster.discovery.version=V2_ONLY.

    v2 only

    The discovery addresses in the settings should include addresses of the new members, and the previous members. Note that only the v2 settings are required.

    As an example, for those using the list resolver, the settings for the new servers should include:

    dbms.cluster.discovery.resolver_type=LIST
    dbms.cluster.discovery.version=V2_ONLY
    
    dbms.cluster.discovery.v2.endpoints=server07.example.com:6000,server08.example.com:6000,server09.example.com:6000,server10.example.com:6000,server11.example.com:6000,server12.example.com:6000
  8. Deallocate, drop, and shut down servers 7, 8, 9.

  9. Finally, using bolt://, connect to the system database of servers 10, 11, 12, and run the following procedure to check the version of the discovery service:

    CALL dbms.cluster.showParallelDiscoveryState();
    Expected result
    +-------------------------------------------------------------+
    | mode      | stateComparison | v1ServerCount | v2ServerCount |
    +-------------------------------------------------------------+
    | "V2_ONLY" | "N/A"           | "N/A"         | "3"           |
    +-------------------------------------------------------------+

Using procedures

  1. For all the servers, ensure that new settings are updated to the configuration as detailed the Preparation section.

    As an example, for those using the list resolver, the settings for all the servers should include:

    dbms.cluster.discovery.resolver_type=LIST
    
    dbms.cluster.discovery.endpoints=server01.example.com:5000,server02.example.com:5000,server03.example.com:5000
    dbms.cluster.discovery.v2.endpoints=server01.example.com:6000,server02.example.com:6000,server03.example.com:6000
  2. In Cypher Shell, connect to the system database of server01 using bolt://. It is important to connect via bolt:// because otherwise the procedure might be routed and executed not on the intended server.

    ./cypher-shell -a bolt://localhost:7681 -d system
  3. Run the procedure:

    CALL dbms.cluster.showParallelDiscoveryState();

    The output indicates mode V1_ONLY, i.e., only v1 is running on this server.

    Expected result
    +-------------------------------------------------------------+
    | mode      | stateComparison | v1ServerCount | v2ServerCount |
    +-------------------------------------------------------------+
    | "V1_ONLY" | "N/A"           | "3"           | "N/A"         |
    +-------------------------------------------------------------+
  4. Run the following procedure to turn on v2 in the background, but keep v1 running in the foreground:

    CALL dbms.cluster.switchDiscoveryServiceVersion("V1_OVER_V2");
  5. Check the state again:

    CALL dbms.cluster.showParallelDiscoveryState();

    Now the returned mode for this server must be V1_OVER_V2 and the stateComparison must show that the states are not matching yet.

    Expected result
    +-------------------------------------------------------------------------------------------------------+
    | mode         | stateComparison                                        | v1ServerCount | v2ServerCount |
    +-------------------------------------------------------------------------------------------------------+
    | "V1_OVER_V2" | "States are not matching after PT1M9.518S: (score:18)" | "3"           | "1"           |
    +-------------------------------------------------------------------------------------------------------+

    The score is a measure of how different the states are. serverCounts displays how many servers can be found by v1 and v2 of the discovery service, respectively. The score is 0 when the states are matching. When some members are running just one of the discovery services (v1 or v2) and other members run both, the score stays permanently high. This is no reason for worry.

  6. To fulfill this convergence, in different terminals, connect to server02 and server03 via bolt:// and repeat steps 3 and 4 on both of them.

  7. Check the state on all servers again. It should show that the states are Matching. Both serverCounts should be 3.

    Expected result
    +----------------------------------------------------------------+
    | mode         | stateComparison | v1ServerCount | v2ServerCount |
    +----------------------------------------------------------------+
    | "V1_OVER_V2" | "Matching"      | "3"           | "3"           |
    +----------------------------------------------------------------+
  8. On all three servers, run:

    CALL dbms.cluster.switchDiscoveryServiceVersion("V2_OVER_V1");

    At this point, v2 is the service that is running the cluster, with v1 running in the background.

    Expected result
    +----------------------------------------------------------------+
    | mode         | stateComparison | v1ServerCount | v2ServerCount |
    +----------------------------------------------------------------+
    | "V2_OVER_V1" | "Matching"      | "3"           | "3"           |
    +----------------------------------------------------------------+
  9. Finally, turn off v1 by running the following procedure on all three servers:

    CALL dbms.cluster.switchDiscoveryServiceVersion("V2_ONLY");
  10. Verify that CALL dbms.cluster.showParallelDiscoveryState(); now shows V2_ONLY running. Note that stateComparison is N/A because you do not have v1 to compare states anymore.

    Expected result
    +-------------------------------------------------------------+
    | mode      | stateComparison | v1ServerCount | v2ServerCount |
    +-------------------------------------------------------------+
    | "V2_ONLY" | "N/A"           | "N/A"         | "3"           |
    +-------------------------------------------------------------+
    Important

    Remember to update the neo4j.conf files for all the servers. The switching using procedures does not persist anything to disk. Therefore, when a server restarts, it starts right back with only v1 running. As such, ensure that dbms.cluster.discovery.version=V2_ONLY, and that dbms.cluster.discovery.v2.endpoints or dbms.kubernetes.discovery.v2.service_port_name are set as required, so that the servers start with v2 running on the next restart.

Monitoring the progress and metrics

When moving from the current discovery service v1 to the new version v2, you can monitor the progress using the procedure CALL dbms.cluster.showParallelDiscoveryState(). This procedure shows the current state of the discovery service on the server and the difference score between the states of the v1 and v2 discovery services. The difference score is a measure of how different the states are. The difference score reported by the procedure does not always stay at 0. Here are some scenarios to consider:

  • In the case of a cluster, when some members are running just one of the discovery services (v1 or v2) and other members run both, the score will stay permanently high. This is no reason for worry.

  • When changes happen in the cluster (like start/stop of a database/server or a leader switch) the difference score will temporarily be greater than 0. It should reach 0 relatively fast again.

  • If the difference score is greater than 0 for a longer period, the actual difference is printed in the debug.log.

You can also use the following metrics to monitor the discovery service: