Essential metrics
To ensure your applications are running smoothly, you should monitor:
-
The server load — the strain on the machine hosting Neo4j.
-
The Neo4j load — the strain on Neo4j.
-
The cluster health — to ensure the cluster is working as expected.
-
The workload of a Neo4j instance.
Reading the Performance section is recommended to better understand the metrics. |
Server load metrics
Monitoring the hardware resources shows the strain on the server running Neo4j.
You can use utilities, such as the collectd daemon or systemd
on Linux, to gather information about the system.
These metrics can help with capacity planning as your workload grows.
Metric name | Description | ||
---|---|---|---|
CPU usage |
If this is reaching 100%, you may need additional CPU capacity. |
||
Used memory |
This metric tells you if you are close to using all available memory on the server. Make sure your peaks are at 95% or below to reduce the risk of running out of memory. For more information, see Memory configuration. |
||
Free disk space |
Observe the rate of your data growth so you can plan for additional storage before you run out. This applies to all disks that Neo4j is writing to. You might also choose to write the log files to a different disk.
|
Neo4j load metrics
The Neo4j load metrics monitor the strain that Neo4j is being put under. They can help with capacity planning.
Metric name | Metric | Description |
---|---|---|
Heap usage |
|
If Neo4j consistently uses 100% of the heap, increase the initial and max heap size. For more information, see Memory configuration. |
Page cache |
|
When a request misses the page cache, the data must be fetched from a much slower disk. Ideally, the hit_ratio should be above 98% most of the time. This shows how much of the allocated memory to the page cache is used. If this is at 100%, consider increasing the page cache size. |
JVM garbage collection |
|
The proportion of time the JVM spends reclaiming the heap instead of doing other work. This metric can spike when the database is running low on memory. If this happens, it can halt processing and cause query execution errors. Consider increasing the size of your database if this appears to be the case. |
Checkpoint time |
|
You should monitor the checkpoint duration to ensure it does not start to approach the interval between checkpoints. If this happens, consider the following steps to improve checkpointing performance:
|
Neo4j cluster health metrics
The cluster health metrics indicate the health of a cluster member at a glance. It is essential to know which instance is the leader. The leader’s load pattern differs from the followers, which should exhibit similar load patterns.
Metric name | Metric | Description |
---|---|---|
Leader |
|
Track this for each database primary.
It reports |
Transaction workload |
|
The ID of the last committed transaction. Track this for each Neo4j instance. It might break into separate charts. It should show one line, ever-increasing, and if one of the lines levels off or falls behind, it is clear that this instance is no longer replicating data, and action is needed to rectify the situation. |
See more about how to Monitor cluster endpoints for status information.
Workload metrics
These metrics help monitor the workload of a Neo4j instance. The absolute values of these depend on the sort of workload you expect.
Metric name | Metric | Description |
---|---|---|
Bolt connections |
|
The number of connections that are currently executing Cypher and returning results. |
Total nodes/relationships |
|
(Not enabled by default) Total number of distinct relationship types. Total number of distinct property names. Total number of relationships. Total number of nodes. |
Throughput |
|
This metric produces a histogram of 99th and 95th percentile transaction latencies. Useful for identifying spikes or increases in the data load. |
For the complete list of all available metrics in Neo4j, see Metrics reference. |