Filtering

This feature is in the beta tier. For more information on feature tiers, see API Tiers.

Subgraph projection is featured in the end-to-end example Jupyter notebooks:

Node Regression with Subgraph and Graph Sample projections

In GDS, algorithms can be executed on a named graph that has been filtered based on its node labels and relationship types. However, that filtered graph only exists during the execution of the algorithm, and it is not possible to filter on property values. If a filtered graph needs to be used multiple times, one can use the subgraph catalog procedure to project a new graph in the graph catalog.

The filter predicates in the subgraph procedure can take labels, relationship types as well as node and relationship properties into account. The new graph can be used in the same way as any other in-memory graph in the catalog. Projecting subgraphs of subgraphs is also possible.

Syntax

A new graph can be projected by using the gds.graph.filter() procedure:

CALL gds.graph.filter(
  graphName: String,
  fromGraphName: String,
  nodeFilter: String,
  relationshipFilter: String,
  configuration: Map
) YIELD
  graphName: String,
  fromGraphName: String,
  nodeFilter: String,
  relationshipFilter: String,
  nodeCount: Integer,
  relationshipCount: Integer,
  projectMillis: Integer

Table 1. Parameters
Name	Type	Description
graphName	String	The name of the new graph that is stored in the graph catalog.
fromGraphName	String	The name of the original graph in the graph catalog.
nodeFilter	String	A Cypher predicate for filtering nodes in the input graph. `*` can be used to allow all nodes.
relationshipFilter	String	A Cypher predicate for filtering relationships in the input graph. `*` can be used to allow all relationships.
configuration	Map	Additional parameters to configure subgraph creation.

Table 2. Subgraph specific configuration
Name	Type	Default	Optional	Description
concurrency	Integer	`4 ^[1]`	yes	The number of concurrent threads used for running the algorithm.
jobId	String	`Generated internally`	yes	An ID that can be provided to more easily track the algorithm’s progress.
logProgress	Boolean	`true`	yes	If disabled the progress percentage will not be logged.
parameters	Map	`{}`	yes	A map of user-defined query parameters that are passed into the node and relationship filters.
1. In a GDS Session the default is the number of available processors

Table 3. Results
Name	Type	Description
graphName	String	The name of the new graph that is stored in the graph catalog.
fromGraphName	String	The name of the original graph in the graph catalog.
nodeFilter	String	Filter predicate for nodes.
relationshipFilter	String	Filter predicate for relationships.
nodeCount	Integer	Number of nodes in the subgraph.
relationshipCount	Integer	Number of relationships in the subgraph.
projectMillis	Integer	Milliseconds for projecting the subgraph.

The nodeFilter and relationshipFilter configuration keys can be used to express filter predicates. Filter predicates are Cypher predicates bound to a single entity. An entity is either a node or a relationship. The filter predicate always needs to evaluate to true or false. A node is contained in the subgraph if the node filter evaluates to true. A relationship is contained in the subgraph if the relationship filter evaluates to true and its source and target nodes are contained in the subgraph.

A predicate is a combination of expressions. The simplest form of expression is a literal. GDS currently supports the following literals:

float literals, e.g., 13.37
integer literals, e.g., 42
boolean literals, i.e., TRUE and FALSE

Property, label and relationship type expressions are bound to an entity. The node entity is always identified by the variable n, the relationship entity is identified by r. Using the variable, we can refer to:

node label expression, e.g., n:Person
relationship type expression, e.g., r:KNOWS
node property expression, e.g., n.age
relationship property expression, e.g., r.since

Boolean predicates combine two expressions and return either true or false. GDS supports the following boolean predicates:

greater/lower than, such as n.age > 42 or r.since < 1984
greater/lower than or equal, such as n.age >= 42 or r.since ⇐ 1984
equality, such as n.age = 23 or r.since = 2020
logical operators, such as
n.age > 23 AND n.age < 42
n.age = 23 OR n.age = 42
n.age = 23 XOR n.age = 42
n.age IS NOT 23

Variable names that can be used within predicates are not arbitrary. A node predicate must refer to variable n. A relationship predicate must refer to variable r.

An exception is the degree function, which simply returns the node’s degree. The function takes relationship types as arguments. Multiple types are considered as disjunctive. For example, degree() > 42 filters nodes with a degree greater than 42 across all relationship types. The expression degree('Foo', 'Bar') > 42 filters nodes where the sum of Foo and Bar relationships is greater than 42.

Examples

All the examples below should be run in an empty database.

The examples use Cypher projections as the norm. Native projections will be deprecated in a future release.

In order to demonstrate the GDS project subgraph capabilities we are going to create a small social graph in Neo4j.

The following Cypher statement will create the example graph in the Neo4j database:

CREATE
  (p0:Person { age: 16 }),
  (p1:Person { age: 18 }),
  (p2:Person { age: 20 }),
  (b0:Book   { isbn: 1234 }),
  (b1:Book   { isbn: 4242 }),
  (p0)-[:KNOWS { since: 2010 }]->(p1),
  (p0)-[:KNOWS { since: 2018 }]->(p2),
  (p0)-[:READS]->(b0),
  (p1)-[:READS]->(b0),
  (p2)-[:READS]->(b1)

Project the social network graph:

MATCH (n:Person)-[r:KNOWS|READS]->(m:Person|Book)
RETURN gds.graph.project('social-graph', n, m,
  {
    sourceNodeLabels: labels(n),
    targetNodeLabels: labels(m),
    sourceNodeProperties: n { .age },
    targetNodeProperties: CASE WHEN m:Person THEN m { .age } ELSE {} END,
    relationshipType: type(r),
    relationshipProperties: CASE WHEN r:KNOWS THEN r { .since } ELSE {} END
  }
)

Node filtering

Create a new graph containing only users of a certain age group:

CALL gds.graph.filter(
  'teenagers',
  'social-graph',
  'n.age > 13 AND n.age <= 18',
  '*'
)
YIELD graphName, fromGraphName, nodeCount, relationshipCount

Table 4. Results
graphName	fromGraphName	nodeCount	relationshipCount
"teenagers"	"social-graph"	2	1

Node degree Filtering

Create a new graph containing only nodes with more than two relationships:

CALL gds.graph.filter(
  'degree-graph',
  'social-graph',
  'degree() > 2',
  '*'
)
YIELD graphName, fromGraphName, nodeCount, relationshipCount

Table 5. Results
graphName	fromGraphName	nodeCount	relationshipCount
"degree-graph"	"social-graph"	1	0

Node and relationship filtering

Create a new graph containing only users of a certain age group that know each other since a given point a time:

CALL gds.graph.filter(
  'teenagers',
  'social-graph',
  'n.age > 13 AND n.age <= 18',
  'r.since >= 2012.0'
)
YIELD graphName, fromGraphName, nodeCount, relationshipCount

Table 6. Results
graphName	fromGraphName	nodeCount	relationshipCount
"teenagers"	"social-graph"	2	0

Bipartite subgraph

Create a new bipartite graph between books and users connected by the READS relationship type:

CALL gds.graph.filter(
  'teenagers-books',
  'social-graph',
  'n:Book OR n:Person',
  'r:READS'
)
YIELD graphName, fromGraphName, nodeCount, relationshipCount

Table 7. Results
graphName	fromGraphName	nodeCount	relationshipCount
"teenagers-books"	"social-graph"	5	3

Bipartite graph node filtering

The previous example can be extended with an additional filter applied only to persons:

CALL gds.graph.filter(
  'teenagers-books',
  'social-graph',
  'n:Book OR (n:Person AND n.age > 18)',
  'r:READS'
)
YIELD graphName, fromGraphName, nodeCount, relationshipCount

Table 8. Results
graphName	fromGraphName	nodeCount	relationshipCount
"teenagers-books"	"social-graph"	3	1

Using query parameters

Similar to Cypher, it is also possible to set query parameters. As an example we can rewrite the node filter example from above using parameters instead of integer literals:

Create a new graph containing only users of a certain age group:

CALL gds.graph.filter(
  'teenagers-parameterized',
  'social-graph',
  'n.age > $lower AND n.age <= $upper',
  '*',
  { parameters: { lower: 13, upper: 18 } }
)
YIELD graphName, fromGraphName, nodeCount, relationshipCount

Table 9. Results
graphName	fromGraphName	nodeCount	relationshipCount
"teenagers-parameterized"	"social-graph"	2	1