Multi-data center routing
This section describes the following:
Introduction
When deploying a multi-data center cluster it is often desirable to take advantage of locality to reduce latency and improve performance. For example, it is preferrable that graph-intensive workloads are executed in the local data center at LAN latencies rather than in a faraway data center at WAN latencies. Neo4j’s load balancing and catch-up strategy plugins for multi-data center scenarios facilitates precisely this.
Neo4j’s load balancing is a cooperative system where the driver asks the cluster on a recurring basis where it should direct the different classes of its workload (e.g., writes and reads). This allows the driver to work independently for long stretches of time, yet check back from time to time to adapt to changes such as a new server having been added for increased capacity. There are also failure situations where the driver asks again immediately, when it cannot use any of its allocated servers for example.
This is mostly transparent from the perspective of a client. On the server side, the load balancing behaviors are configured using a simple Domain Specific Language, DSL, and exposed under a named load balancing policy which the driver can bind to. All server-side configuration is performed on the Primary servers.
Catch-up strategy plugins are sets of rules that define how secondary servers contact upstream servers in the cluster in order to synchronize transaction logs. Neo4j comes with a set of pre-defined strategies, and also user-defined strategies can be created using the same DSL. Finally, Neo4j supports an API which advanced users may use to enhance upstream recommendations.
Once a catch-up strategy plugin resolves a satisfactory upstream server, it is used for pulling transactions to update the local secondary for a single synchronization. For subsequent updates, the procedure is repeated so that the most preferred available upstream server is always resolved.
Prerequisite configuration
Server tags
Both load balancing across multiple data centers and user-defined catch-up strategies are predicated on the Server Tag concept.
In order to optimize the use of the cluster’s servers according to the specific requirements, they are sorted using Server Tags.
Server tags can map to data centers, availability zones, or any other significant topological elements from the operator’s domain, e.g., us
, us-east
.
Applying the same tag to multiple servers logically groups them together.
Note that servers can have mulitple tags.
Server tags are defined as a key that maps onto a set of servers in a cluster.
Server tags are defined on each server using the initial.server.tags
parameter in neo4j.conf.
Each server in a cluster can be tagged with to zero or more server tags.
Server tags can be altered at runtime via the ALTER SERVER
command,
see Altering server options for more details.
Grouping servers using server tags is achieved in neo4j.conf
as in the following examples:
us
and us-east
initial.server.tags=us,us-east
london
initial.server.tags=london
eu
initial.server.tags=eu
Note that membership of a group implied by a server tag is explicit.
For example, a server tagged with gb-london
is not automatically part of the same tag group as a server that is tagged with gb
or eu
unless that server is also explicitly tagged with those tags.
Primaries for reading
Depending on the deployment and the available number of servers in the cluster, different strategies make sense for whether or not the reading workload should be routed to the primary servers.
The following configuration allows the routing of read workload to primary servers.
Valid values are true
and false
.
dbms.routing.reads_on_primaries_enabled=true
The load balancing framework
There are different topology-aware load balancing options available for client applications in a multi-data center Neo4j deployment. There are different ways to configure the load balancing for the cluster so that client applications can direct its workload at the most appropriate cluster members, such as those nearby.
The load balancing system is based on a plugin architecture for future extensibility and for allowing user customizations. The current version ships with exactly one such canned plugin called the server policies plugin.
The server policies plugin is selected by setting the following property:
dbms.routing.load_balancing.plugin=server_policies
Under the server policies plugin, a number of load balancing policies can be configured server-side and be exposed to drivers under unique names. The drivers, in turn, must on instantiation select an appropriate policy by specifying its name. Common patterns for naming policies are after geographical regions or intended application groups.
It is crucial to define the exact same policies on all servers since this is to be regarded as cluster-wide configuration and failure to do so leads to unpredictable behavior. Similarly, policies in active use should not be removed or renamed since it breaks applications trying to use these policies. It is perfectly acceptable and expected however, that policies be modified under the same name. |
If a driver asks for a policy name that is not available, then the driver is not able to use the cluster. A driver that does not specify any name at all gets the behavior of the default policy as configured. The default policy, if left unchanged, distributes the load across all servers. It is possible to change the default policy to any behavior that a named policy can have.
A misconfigured driver or load balancing policy results in suboptimal routing choices and can even prevent successful interactions with the cluster entirely.
The details of how to write a custom plugin are not documented here. Please contact Neo4j Professional Services if you think that you need a custom plugin. |
Use load balancing from Neo4j drivers
Once enabled and configured, the custom load balancing feature is used by drivers to route traffic as intended. See the Neo4j Drivers manuals for instructions on how to configure drivers to use custom load balancing. |
Policy definitions
The configuration of load balancing policies is transparent to client applications and expressed via a simple DSL. The syntax consists of a set of rules which are considered in order. The first rule to produce a non-empty result is the final result.
rule1; rule2; rule3
Each rule in turn consists of a set of filters which limit the considered servers, starting with the complete set. Note that the evaluation of each rule starts fresh with the complete set of available servers.
There is a fixed set of filters which composes a rule and they are chained together using arrows.
filter1 -> filter2 -> filter3
If there are any servers still left after the last filter then the rule evaluation has produced a result and this is returned to the driver. However, if there are no servers left then the next rule is considered. If no rule is able to produce a usable result then the driver is signalled a failure.
Policy names
The policies are configured under the namespace of the server_policies
plugin and named as desired.
Policy names can contain alphanumeric characters and underscores, and they are case sensitive.
Below is the property key for a policy with the name mypolicy
.
dbms.routing.load_balancing.config.server_policies.mypolicy=
The actual policy is defined in the value part using the DSL.
The default
policy name is reserved for the default policy.
It is possible to configure this policy like any other and it is used by driver clients that do not specify a policy.
Additionally, any number of policies can be created using unique policy names. The policy name can suggest a particular region or an application for which it is intended to be used.
Filters
There are four filters available for specifying rules, detailed below. The syntax is similar to a method call with parameters.
-
tags(name1, name2, …)
-
Only servers that are tagged with any of the specified tags pass the filter.
-
The defined names must match those of the server tags.
-
Prior to 5.4
tags()
were referred to asgroups()
, which continue to work but are now deprecated.
-
-
min(count)
-
Only the minimum amount of servers are allowed to pass (or none).
-
Allows overload conditions to be managed.
-
-
all()
-
No need to specify since it is implicit at the beginning of each rule.
-
Implicitly the last rule (override this behavior using halt).
-
-
halt()
-
Only makes sense as the last filter in the last rule.
-
Stops the processing of any more rules.
-
The tags filter is essentially an OR-filter, e.g. tags(A,B)
which passes any server in with either tag A, B or both (the union of the server tags).
An AND-filter can also be created by chaining two filters as in tags(A) -> tags(B)
, which only passes servers with both tags (the intersect of the server tags).
Load balancing examples
The discussion on multi-data center clusters introduced a four region, multi-data center setup. The cardinal compass points for regions and numbered data centers within those regions were used there and the same hypothetical setup is used here as well.
The behavior of the load balancer is configured in the property dbms.routing.load_balancing.config.server_policies.<policy-name>
.
The specified rules allows for fine-tuning how the cluster routes requests under load.
The examples make use of the line continuation character \
for better readability.
It is valid syntax in neo4j.conf as well and it is recommended to break up complicated rule definitions using this and a new rule on every line.
The most restrictive strategy is to insist on a particular data center to the exclusion of all others:
dbms.routing.load_balancing.config.server_policies.north1_only=\
tags(north1)->min(2); halt();
This case states that the intention is to send queries to servers tagged with north1
, which maps onto a specific physical data center, provided there are two of them available.
If at least two servers tagged with north1
cannot be provided, then the operation should halt()
, i.e. not try any other data center.
While the previous example demonstrates the basic form of load balancing rules, it is possible to be a little more expansive:
dbms.routing.load_balancing.config.server_policies.north1=\
tags(north1)->min(2);
In this case if at least two servers are tagged with north1
then the load is balanced across them.
Otherwise, any server in the whole cluster is used, falling back to the implicit, final all()
rule.
The previous example considered only a single data center before resorting to the whole cluster. If there is a hierarchy or region concept exposed through the server groups, the fall back can be more graceful:
dbms.routing.load_balancing.config.server_policies.north_app1=\
tags(north1,north2)->min(2);\
tags(north);\
all();
This example says that the cluster should load balance across servers with the north1
and north2
tags provided there are at least two machines available across them.
Failing that, any server in the north
region can be used, and if the whole of the north is offline, any server in the cluster can be used.
Catch-up strategy plugins
Catch-up strategy plugins are sets of rules that define how secondaries contact upstream servers in the cluster in order to synchronize transaction logs. Neo4j comes with a set of pre-defined strategies, and also leverages the DSL to flexibly create user-defined strategies. Finally, Neo4j supports an API which advanced users may use to enhance upstream server recommendations.
Once a catch-up strategy plugin resolves a satisfactory upstream server, it is used for pulling transactions to update the local secondary for a single synchronization. For subsequent updates, the procedure is repeated so that the most preferred available upstream server is always resolved.
Configuring upstream selection strategy using pre-defined catch-up strategies
Neo4j ships with the following pre-defined catch-up strategy plugins. These provide coarse-grained algorithms for selecting an upstream server:
Plugin name | Resulting behavior |
---|---|
|
Connect to any primary server selecting at random from those currently available. |
|
Connect to any available secondary server, but around 10% of the time connect to any random primary server. |
|
Connect at random to any available secondary server tagged with any of the server tags specified in the comma-separated list |
|
Connect only to the current Raft leader of the primary servers. |
|
Connect at random to any available secondary server in the server groups specified in the comma-separated list |
|
Connect at random to any available secondary server in any of the server groups to which this server belongs.
Deprecated, please use |
Pre-defined strategies are used by configuring the server.cluster.catchup.upstream_strategy
option.
Doing so allows for specification of an ordered preference of strategies to resolve an upstream provider of transaction data.
A comma-separated list of strategy plugin names with preferred strategies is provided earlier in that list.
The catch-up strategy is selected by asking each of the strategies in list-order whether they can provide an upstream server from which transactions can be pulled.
Consider the following configuration example:
server.cluster.catchup.upstream_strategy=connect-randomly-to-server-tags,typically-connect-to-random-secondary
With this configuration the secondary server first tries to connect to any other server with tag(s) specified in server.cluster.catchup.connect_randomly_to_server_tags
.
Should it fail to find any live servers with those tags, then it connects to a random secondary server.
To ensure that downstream servers can still access live data in the event of upstream failures, the last resort of any server is always to contact a random primary server.
This is equivalent to ending the server.cluster.catchup.upstream_strategy
configuration with connect-to-random-primary-server
.
Configuring user-defined catch-up strategies
Neo4j clusters support a small DSL for the configuration of client-cluster load balancing. This is described in detail in Domain Specific Language and Filters. The same DSL is used to describe preferences for how a server binds to another server to request transaction updates.
The DSL is made available by selecting the user-defined
catch-up strategy as follows:
server.cluster.catchup.upstream_strategy=user-defined
Once the user-defined strategy has been specified, we can add configuration to the server.cluster.catchup.user_defined_upstream_strategy
setting based on the server tags that have been set for the cluster.
This functionality is described with two examples:
For illustrative purposes four regions are proposed: north
, south
, east
, and west
and within each region there is a number of data centers such as north1
or west2
.
The server tags are configured so that each data center maps to its own server tag.
Additionally it is assumed that each data center fails independently from the others and that a region can act as a supergroup of its constituent data centers.
So a server in the north
region might have configuration like initial.server.tags=north2,north
which puts it in two groups that match to our physical topology as shown in the diagram below.
Once the servers are tagged, the next task is to define some upstream selection rules based on them.
For design purposes, assume that any server in one of the north
region data centers prefers to catch-up within the data center if it can, but resorts to any northern instance otherwise.
To configure that behavior, add:
server.cluster.catchup.user_defined_upstream_strategy=tags(north2); tags(north); halt()
The configuration is in precedence order from left to right.
The tags()
operator yields a server tag from which to catchup.
In this case, only if there are no servers tagged with north2
does the operation proceed to the tags(north)
rule which yields any server tagged with north
.
Finally, if no servers can be resolved with any of the previous tags, then the rule chain is stopped via halt()
.
Note that the use of halt()
ends the rule chain explicitly.
If a halt()
is not used at the end of the rule chain, then the all()
rule is implicitly added.
all()
is expansive: it offers up all servers and so increases the likelihood of finding an available upstream server.
However all()
is indiscriminate and the servers it offers are not guaranteed to be topologically or geographically local, potentially increasing the latency of synchronization.
The example above shows a simple hierarchy of preferences expressed through the use of server tags. But the hierarchy can be more sophisticated. For example, conditions can be placed on the tagged catch-up servers.
In this example it is desired to roughly qualify cluster health before selecting from where to catchup.
For this, the min()
filter is used as follows:
server.cluster.catchup.user_defined_upstream_strategy=tags(north2)->min(3), tags(north)->min(3); all();
tags(north2)->min(3)
states that catch-up from servers tagged with north2
should be performed only if there are three available servers, which here is interpreted as an indicator of good health.
If north2
can’t meet that requirement then catch-up should be attempted from any server tagged with north
provided there are at least three of them available as per tags(north)->min(3)
.
Finally, if catch-up cannot be performed from a sufficiently healthy north
region, then the operation (explicitly) falls back to the whole cluster with all()
.
The min()
filter is a simple but reasonable health indicator of a set of servers with the same tag.
Favoring data centers
In a multi-data center scenario, while it remains a rare occurrence, it is possible to bias where writes for the specified database should be directed.
db.cluster.raft.leader_transfer.priority_tag
can be applied to specify a set of servers with a given tag which should have priority when selecting the leader for a given database.
The priority tag can be set on one or multiple databases and it means that the cluster attempts to keep the leadership for the configured database on a server tagged with the configured server tag.
A database for which db.cluster.raft.leader_transfer.priority_tag
has been configured is excluded from the automatic balancing of leaderships across a cluster.
It is therefore recommended to not use this configuration unless it is necessary.