A method to replicate a Causal Cluster to new hardware with minimum downtime
If the opportunity arises such that you are in need of replicating your existing Causal Cluster to a new hardware setup, the following can be used to allow for minimal downtime.
Let us first start with an existing 3 instances cluster with the following characteristics:
neo4j> call dbms.cluster.overview
+---------------------------------------------------------------------------------------------------------------------------------------------+
| id | addresses | role | groups |
+---------------------------------------------------------------------------------------------------------------------------------------------+
| "ffc16977-4ab8-41b5-a4e2-e0e32e8abd6f" | ["bolt://10.1.1.1:7617", "http://10.1.1.1:7474", "https://10.1.1.1:7473"] | "LEADER" | [] |
| "f0a78cd1-7ba3-45f6-aba3-0abb60d785ef" | ["bolt://10.1.1.2:7627", "http://10.1.1.2:7474", "https://10.1.1.2:7473"] | "FOLLOWER" | [] |
| "2fe26571-6fcc-4d1e-9f42-b81d08579057" | ["bolt://10.1.1.3:7637", "http://10.1.1.3:7474", "https://10.1.1.3:7473"] | "FOLLOWER" | [] |
+---------------------------------------------------------------------------------------------------------------------------------------------+
Each instance has defined its conf/neo4j.conf with https://neo4j.com/docs/operations-manual/current/reference/configuration-settings/#config_causal_clustering.expected_core_cluster_size and causal_clustering.initial_discovery_members defined as:
causal_clustering.expected_core_cluster_size=3
causal_clustering.initial_discovery_members=10.1.1.1:5001,10.1.1.2:5002,10.1.1.3:5003
All other ports referenced are using the default values.
To add 3 new instances, for example at IP address 10.2.2.1, 10.2.2.2 and 10.2.2.3 perform the following steps
-
install and create the new 3 instances cluster at IP addresses 10.2.2.1, 10.2.2.2 and 10.2.2.3.
-
in each of these 3 new instances conf/neo4j.conf define their
ha.initial_hosts
to be defined as:causal_clustering.initial_discovery_members=10.1.1.1:5001,10.1.1.2:5001,10.1.1.3:5001
-
Start up each instance at 10.2.2.1, 10.2.2.2, and 10.2.2.3. These 3 new instances will then join the initial cluster at 10.1.1.1, 10.1.1.2 and 10.1.1.3 and copy down the
databases\graph.db
. Runningdbms.cluster.overview();
will return output similar to:+---------------------------------------------------------------------------------------------------------------------------------------------+ | id | addresses | role | groups | +---------------------------------------------------------------------------------------------------------------------------------------------+ | "ffc16977-4ab8-41b5-a4e2-e0e32e8abd6f" | ["bolt://10.1.1.1:7687", "http://10.1.1.1:7474", "https://10.1.1.1:7473"] | "LEADER" | [] | | "f0a78cd1-7ba3-45f6-aba3-0abb60d785ef" | ["bolt://10.1.1.2:7687", "http://10.1.1.2:7474", "https://10.1.1.2:7473"] | "FOLLOWER" | [] | | "2fe26571-6fcc-4d1e-9f42-b81d08579057" | ["bolt://10.1.1.3:7687", "http://10.1.1.3:7474", "https://10.1.1.3:7473"] | "FOLLOWER" | [] | | "847b74c2-34a9-4458-b0e2-ea36cf25fdbf" | ["bolt://10.2.2.1:7687", "http://10.2.2.1:7474", "https://10.2.2.1:7473"] | "FOLLOWER" | [] | | "39f92686-f581-4454-b288-a2254d38ea5c" | ["bolt://10.2.2.2:7687", "http://10.2.2.2:7474", "https://10.2.2.2:7473"] | "FOLLOWER" | [] | | "e4114ad2-dcd1-4d22-8f56-a085524c9ed0" | ["bolt://10.2.2.2:7687", "http://10.2.2.3:7474", "https://10.2.2.3:7473"] | "FOLLOWER" | [] | +---------------------------------------------------------------------------------------------------------------------------------------------+
-
Once the 3 new instances have completed the copy of graph.db from master, one can then cleanly stop the 3 initial instances at 10.1.1.1, 10.1.1.2, and 10.1.1.3 via a
bin/neo4j stop
. The 3 remaining instances will continue to run:+---------------------------------------------------------------------------------------------------------------------------------------------+ | id | addresses | role | groups | +---------------------------------------------------------------------------------------------------------------------------------------------+ | "847b74c2-34a9-4458-b0e2-ea36cf25fdbf" | ["bolt://10.2.2.1:7687", "http://10.2.2.1:7474", "https://10.2.2.1:7473"] | "LEADER" | [] | | "39f92686-f581-4454-b288-a2254d38ea5c" | ["bolt://10.2.2.2:7687", "http://10.2.2.2:7474", "https://10.2.2.2:7473"] | "FOLLOWER" | [] | | "e4114ad2-dcd1-4d22-8f56-a085524c9ed0" | ["bolt://10.2.2.2:7687", "http://10.2.2.3:7474", "https://10.2.2.3:7473"] | "FOLLOWER" | [] | +---------------------------------------------------------------------------------------------------------------------------------------------+
-
If a Load Balancer was in front of the 3 instance cluster, at 10.1.1.1, 10.1.1.2, and 10.1.1.3, it should be updated to now point to 10.2.2.1, 10.2.2.2, and 10.2.2.3.
-
Since the initial 3 instances have been shut down and to provide ability for the 3 new instances to successfully restart at some later time, update the causal_clustering.initial_discovery_members of the new 3 instances and change:
causal_clustering.initial_discovery_members=10.1.1.1:5001,10.1.1.2:5001,10.1.1.3:5001
to:
causal_clustering.initial_discovery_members=10.2.2.1:5001,10.2.2.2:5001,10.2.2.3:5001
-
If you are currently using the Bolt driver to connect to the cluster, you would then need to update the connection string to reference a new url, for example changing
bolt+routing://10.1.1.1:7678
tobolt+routing://10.2.2.1:7678
.
Is this page helpful?