Filtering
This feature is in the beta tier. For more information on feature tiers, see API Tiers.
Subgraph projection is featured in the end-to-end example Jupyter notebooks: |
In GDS, algorithms can be executed on a named graph that has been filtered based on its node labels and relationship types. However, that filtered graph only exists during the execution of the algorithm, and it is not possible to filter on property values. If a filtered graph needs to be used multiple times, one can use the subgraph catalog procedure to project a new graph in the graph catalog.
The filter predicates in the subgraph procedure can take labels, relationship types as well as node and relationship properties into account. The new graph can be used in the same way as any other in-memory graph in the catalog. Projecting subgraphs of subgraphs is also possible.
Syntax
gds.graph.filter()
procedure:CALL gds.graph.filter(
graphName: String,
fromGraphName: String,
nodeFilter: String,
relationshipFilter: String,
configuration: Map
) YIELD
graphName: String,
fromGraphName: String,
nodeFilter: String,
relationshipFilter: String,
nodeCount: Integer,
relationshipCount: Integer,
projectMillis: Integer
Name | Type | Description |
---|---|---|
graphName |
String |
The name of the new graph that is stored in the graph catalog. |
fromGraphName |
String |
The name of the original graph in the graph catalog. |
nodeFilter |
String |
A Cypher predicate for filtering nodes in the input graph. |
relationshipFilter |
String |
A Cypher predicate for filtering relationships in the input graph. |
configuration |
Map |
Additional parameters to configure subgraph creation. |
Name | Type | Default | Optional | Description |
---|---|---|---|---|
concurrency |
Integer |
|
yes |
The number of concurrent threads used for filtering the graph. |
jobId |
String |
|
yes |
An ID that can be provided to more easily track the projection’s progress. |
parameters |
Map |
|
yes |
A map of user-defined query parameters that are passed into the node and relationship filters. |
Name | Type | Description |
---|---|---|
graphName |
String |
The name of the new graph that is stored in the graph catalog. |
fromGraphName |
String |
The name of the original graph in the graph catalog. |
nodeFilter |
String |
Filter predicate for nodes. |
relationshipFilter |
String |
Filter predicate for relationships. |
nodeCount |
Integer |
Number of nodes in the subgraph. |
relationshipCount |
Integer |
Number of relationships in the subgraph. |
projectMillis |
Integer |
Milliseconds for projecting the subgraph. |
The nodeFilter
and relationshipFilter
configuration keys can be used to express filter predicates.
Filter predicates are Cypher predicates bound to a single entity.
An entity is either a node or a relationship.
The filter predicate always needs to evaluate to true
or false
.
A node is contained in the subgraph if the node filter evaluates to true
.
A relationship is contained in the subgraph if the relationship filter evaluates to true
and its source and target nodes are contained in the subgraph.
A predicate is a combination of expressions. The simplest form of expression is a literal. GDS currently supports the following literals:
-
float literals, e.g.,
13.37
-
integer literals, e.g.,
42
-
boolean literals, i.e.,
TRUE
andFALSE
Property, label and relationship type expressions are bound to an entity.
The node entity is always identified by the variable n
, the relationship entity is identified by r
.
Using the variable, we can refer to:
-
node label expression, e.g.,
n:Person
-
relationship type expression, e.g.,
r:KNOWS
-
node property expression, e.g.,
n.age
-
relationship property expression, e.g.,
r.since
Boolean predicates combine two expressions and return either true
or false
.
GDS supports the following boolean predicates:
-
greater/lower than, such as
n.age > 42
orr.since < 1984
-
greater/lower than or equal, such as
n.age >= 42
orr.since ⇐ 1984
-
equality, such as
n.age = 23
orr.since = 2020
-
logical operators, such as
-
n.age > 23 AND n.age < 42
-
n.age = 23 OR n.age = 42
-
n.age = 23 XOR n.age = 42
-
n.age IS NOT 23
Variable names that can be used within predicates are not arbitrary.
A node predicate must refer to variable n
.
A relationship predicate must refer to variable r
.
An exception is the degree function, which simply returns the node’s degree.
The function takes relationship types as arguments.
Multiple types are considered as disjunctive.
For example, degree() > 42
filters nodes with a degree greater than 42 across all relationship types.
The expression degree('Foo', 'Bar') > 42
filters nodes where the sum of Foo
and Bar
relationships is greater than 42.
Examples
All the examples below should be run in an empty database. The examples use Cypher projections as the norm. Native projections will be deprecated in a future release. |
In order to demonstrate the GDS project subgraph capabilities we are going to create a small social graph in Neo4j.
CREATE
(p0:Person { age: 16 }),
(p1:Person { age: 18 }),
(p2:Person { age: 20 }),
(b0:Book { isbn: 1234 }),
(b1:Book { isbn: 4242 }),
(p0)-[:KNOWS { since: 2010 }]->(p1),
(p0)-[:KNOWS { since: 2018 }]->(p2),
(p0)-[:READS]->(b0),
(p1)-[:READS]->(b0),
(p2)-[:READS]->(b1)
MATCH (n:Person)-[r:KNOWS|READS]->(m:Person|Book)
RETURN gds.graph.project('social-graph', n, m,
{
sourceNodeLabels: labels(n),
targetNodeLabels: labels(m),
sourceNodeProperties: n { .age },
targetNodeProperties: CASE WHEN m:Person THEN m { .age } ELSE {} END,
relationshipType: type(r),
relationshipProperties: CASE WHEN r:KNOWS THEN r { .since } ELSE {} END
}
)
Node filtering
CALL gds.graph.filter(
'teenagers',
'social-graph',
'n.age > 13 AND n.age <= 18',
'*'
)
YIELD graphName, fromGraphName, nodeCount, relationshipCount
graphName | fromGraphName | nodeCount | relationshipCount |
---|---|---|---|
"teenagers" |
"social-graph" |
2 |
1 |
Node degree Filtering
CALL gds.graph.filter(
'degree-graph',
'social-graph',
'degree() > 2',
'*'
)
YIELD graphName, fromGraphName, nodeCount, relationshipCount
graphName | fromGraphName | nodeCount | relationshipCount |
---|---|---|---|
"degree-graph" |
"social-graph" |
1 |
0 |
Node and relationship filtering
CALL gds.graph.filter(
'teenagers',
'social-graph',
'n.age > 13 AND n.age <= 18',
'r.since >= 2012.0'
)
YIELD graphName, fromGraphName, nodeCount, relationshipCount
graphName | fromGraphName | nodeCount | relationshipCount |
---|---|---|---|
"teenagers" |
"social-graph" |
2 |
0 |
Bipartite subgraph
READS
relationship type:CALL gds.graph.filter(
'teenagers-books',
'social-graph',
'n:Book OR n:Person',
'r:READS'
)
YIELD graphName, fromGraphName, nodeCount, relationshipCount
graphName | fromGraphName | nodeCount | relationshipCount |
---|---|---|---|
"teenagers-books" |
"social-graph" |
5 |
3 |
Bipartite graph node filtering
CALL gds.graph.filter(
'teenagers-books',
'social-graph',
'n:Book OR (n:Person AND n.age > 18)',
'r:READS'
)
YIELD graphName, fromGraphName, nodeCount, relationshipCount
graphName | fromGraphName | nodeCount | relationshipCount |
---|---|---|---|
"teenagers-books" |
"social-graph" |
3 |
1 |
Using query parameters
Similar to Cypher, it is also possible to set query parameters. As an example we can rewrite the node filter example from above using parameters instead of integer literals:
CALL gds.graph.filter(
'teenagers-parameterized',
'social-graph',
'n.age > $lower AND n.age <= $upper',
'*',
{ parameters: { lower: 13, upper: 18 } }
)
YIELD graphName, fromGraphName, nodeCount, relationshipCount
graphName | fromGraphName | nodeCount | relationshipCount |
---|---|---|---|
"teenagers-parameterized" |
"social-graph" |
2 |
1 |