Introduction
The Neo4j Graph Data Science (GDS) library provides efficiently implemented, parallel versions of common graph algorithms, exposed as Cypher procedures. Additionally, GDS includes machine learning pipelines to train predictive supervised models to solve graph problems, such as predicting missing relationships.
API tiers
The GDS API comprises Cypher procedures and functions. Each of these exist in one of three tiers of maturity:
- Production-quality
-
Indicates that the feature has been tested with regards to stability and scalability.
- Beta
-
Indicates that the feature is a candidate for the production-quality tier.
- Alpha
-
Indicates that the feature is experimental and might be changed or removed at any time.
The Operations Reference, lists all operations in GDS according to their tier.
Algorithms
Graph algorithms are used to compute metrics for graphs, nodes, or relationships.
They can provide insights on relevant entities in the graph (centralities, ranking), or inherent structures like communities (community-detection, graph-partitioning, clustering).
Many graph algorithms are iterative approaches that frequently traverse the graph for the computation using random walks, breadth-first or depth-first searches, or pattern matching.
Due to the exponential growth of possible paths with increasing distance, many of the approaches also have high algorithmic complexity.
Fortunately, optimized algorithms exist that utilize certain structures of the graph, memoize already explored parts, and parallelize operations. Whenever possible, we’ve applied these optimizations.
The Neo4j Graph Data Science library contains a large number of algorithms, which are detailed in the Algorithms chapter.
Algorithm traits
Algorithms in GDS have specific ways to make use of various aspects of its input graph(s). We call these algorithm traits.
An algorithm trait can be:
-
supported: the algorithm leverages the trait and produces a well-defined results;
-
allowed: the algorithm does not leverage the trait but it still produces results;
-
unsupported: the algorithm does not leverage the trait and, given a graph with the trait, will return an error.
The following algorithm traits exist:
- Directed
-
The algorithm is well-defined on a directed graph.
- Undirected
-
The algorithm is well-defined on an undirected graph.
- Heterogeneous
-
The algorithm has the ability to distinguish between nodes and/or relationships of different types.
- Heterogeneous nodes
-
The algorithm has the ability to distinguish between nodes of different types.
- Heterogeneous relationships
-
The algorithm has the ability to distinguish between relationships of different types.
- Weighted relationships
-
The algorithm supports configuration to set relationship properties to use as weights. These values can represent cost, time, capacity or some other domain-specific properties, specified via the relationshipWeightProperty configuration parameter. The algorithm will by default consider each relationship as equally important.
Graph Catalog
In order to run the algorithms as efficiently as possible, GDS uses a specialized graph format to represent the graph data. It is therefore necessary to load the graph data from the Neo4j database into an in memory graph catalog. The amount of data loaded can be controlled by so called graph projections, which also allow, for example, filtering on node labels and relationship types, among other options.
For more information see Graph Management.
Editions
The Neo4j Graph Data Science library is available in two editions. By default, GDS will operate as the Community Edition. To unlock Enterprise Edition features, a valid Neo4j Graph Data Science Enterprise license file is required. See GDS Enterprise Edition for how to configure the license.
-
The open source Community Edition:
-
Includes all algorithms.
-
Limits the catalog operations to manage graphs and models. Unavailable operations are listed under the Enterprise Edition below.
-
Limits the concurrency to maximum 4 CPU cores.
-
Limits the capacity of the model catalog to 3 models.
-
-
The Neo4j Graph Data Science library Enterprise Edition:
-
Supports running on any amount of CPU cores.
-
Supports running GDS write workloads as part of a Neo4j cluster deployment.
-
Supports capacity and load monitoring.
-
Supports extended graph catalog features, including:
-
Graph backup and restore.
-
Data import and export via Apache Arrow.
-
-
Supports extended model catalog features, including:
-
Storing any number of models in the model catalog.
-
Sharing of models between users, through publishing.
-
Model persistence to disk.
-
-
Supports an optimized graph implementation, enabled by default.
-
Supports the configuration of defaults and limits.
-