Cypher projection (deprecated)
This page describes the Legacy Cypher projection, which is deprecated. The replacement is to use the new Cypher projection, which is described in Projecting graphs using Cypher. A migration guide is available at Appendix C, Migration from Legacy to new Cypher projection. |
Legacy Cypher projections are a more flexible and expressive approach compared to native projections. A Legacy Cypher projection uses Cypher to create (project) an in-memory graph from the Neo4j database.
Considerations
Lifecycle
The projected graphs will reside in the catalog until either:
|
Node property support
Legacy Cypher projections can only project a limited set of node property types from a Cypher query. The Node Properties page details which node property types are supported. Other types of node properties have to be transformed or encoded into one of the supported types in order to be projected using a Legacy Cypher projection.
Syntax
A Legacy Cypher projection takes three mandatory arguments: graphName
, nodeQuery
and relationshipQuery
.
In addition, the optional configuration
parameter allows us to further configure graph creation.
CALL gds.graph.project.cypher(
graphName: String,
nodeQuery: String,
relationshipQuery: String,
configuration: Map
) YIELD
graphName: String,
nodeQuery: String,
nodeCount: Integer,
relationshipQuery: String,
relationshipCount: Integer,
projectMillis: Integer
Name | Optional | Description |
---|---|---|
graphName |
no |
The name under which the graph is stored in the catalog. |
nodeQuery |
no |
Cypher query to project nodes. The query result must contain an |
relationshipQuery |
no |
Cypher query to project relationships. The query result must contain |
configuration |
yes |
Additional parameters to configure the Legacy Cypher projection. |
Name | Type | Default | Description |
---|---|---|---|
readConcurrency |
Integer |
4 |
The number of concurrent threads used for creating the graph. |
validateRelationships |
Boolean |
true |
Whether to throw an error if the |
parameters |
Map |
{} |
A map of user-defined query parameters that are passed into the node and relationship queries. |
jobId |
String |
Generated internally |
An ID that can be provided to more easily track the projection’s progress. |
Name | Type | Description |
---|---|---|
graphName |
String |
The name under which the graph is stored in the catalog. |
nodeQuery |
String |
The Cypher query used to project the nodes in the graph. |
nodeCount |
Integer |
The number of nodes stored in the projected graph. |
relationshipQuery |
String |
The Cypher query used to project the relationships in the graph. |
relationshipCount |
Integer |
The number of relationships stored in the projected graph. |
projectMillis |
Integer |
Milliseconds for projecting the graph. |
To get information about a stored graph, such as its schema, one can use gds.graph.list. |
Examples
All the examples below should be run in an empty database. |
In order to demonstrate the GDS Graph Project capabilities we are going to create a small social network graph in Neo4j. The example graph looks like this:
CREATE
(florentin:Person { name: 'Florentin', age: 16 }),
(adam:Person { name: 'Adam', age: 18 }),
(veselin:Person { name: 'Veselin', age: 20, ratings: [5.0] }),
(hobbit:Book { name: 'The Hobbit', isbn: 1234, numberOfPages: 310, ratings: [1.0, 2.0, 3.0, 4.5] }),
(frankenstein:Book { name: 'Frankenstein', isbn: 4242, price: 19.99 }),
(florentin)-[:KNOWS { since: 2010 }]->(adam),
(florentin)-[:KNOWS { since: 2018 }]->(veselin),
(florentin)-[:READ { numberOfPages: 4 }]->(hobbit),
(florentin)-[:READ { numberOfPages: 42 }]->(hobbit),
(adam)-[:READ { numberOfPages: 30 }]->(hobbit),
(veselin)-[:READ]->(frankenstein)
Simple graph
A simple graph is a graph with only one node label and relationship type, i.e., a monopartite graph.
We are going to start with demonstrating how to load a simple graph by projecting only the Person
node label and KNOWS
relationship type.
Person
nodes and KNOWS
relationships:CALL gds.graph.project.cypher(
'persons',
'MATCH (n:Person) RETURN id(n) AS id',
'MATCH (n:Person)-[r:KNOWS]->(m:Person) RETURN id(n) AS source, id(m) AS target')
YIELD
graphName AS graph, nodeQuery, nodeCount AS nodes, relationshipQuery, relationshipCount AS rels
graph | nodeQuery | nodes | relationshipQuery | rels |
---|---|---|---|---|
"persons" |
|
3 |
"MATCH (n:Person)-[r:KNOWS]→(m:Person) RETURN id(n) AS source, id(m) AS target" |
|
Multi-graph
A multi-graph is a graph with multiple node labels and relationship types.
To retain the label and type information when we load multiple node labels and relationship types, we can add a labels
column to the node query and a type
column to the relationship query.
Person
and Book
nodes and KNOWS
and READ
relationships:CALL gds.graph.project.cypher(
'personsAndBooks',
'MATCH (n) WHERE n:Person OR n:Book RETURN id(n) AS id, labels(n) AS labels',
'MATCH (n)-[r:KNOWS|READ]->(m) RETURN id(n) AS source, id(m) AS target, type(r) AS type')
YIELD
graphName AS graph, nodeQuery, nodeCount AS nodes, relationshipCount AS rels
graph | nodeQuery | nodes | rels |
---|---|---|---|
"personsAndBooks" |
|
|
|
Relationship orientation
The native projection supports specifying an orientation per relationship type.
The Legacy Cypher projection treats every relationship returned by the relationship query as if it were in NATURAL
orientation and creates a directed relationship from the first provided id (source) to the second (target).
Projecting in REVERSE
orientation can be achieved by switching the order of ids in the RETURN clause such as MATCH (n)-[r:KNOWS]→(m) RETURN id(m) AS source, id(n) AS target, type(r) AS type
.
It not possible to project graphs in UNDIRECTED
orientation when Legacy Cypher projections are used.
Some algorithms require that the graph was loaded with |
Node properties
To load node properties, we add a column to the result of the node query for each property. Thereby, we use the Cypher function coalesce() function to specify the default value, if the node does not have the property.
Person
and Book
nodes and KNOWS
and READ
relationships:CALL gds.graph.project.cypher(
'graphWithProperties',
'MATCH (n)
WHERE n:Book OR n:Person
RETURN
id(n) AS id,
labels(n) AS labels,
coalesce(n.age, 18) AS age,
coalesce(n.price, 5.0) AS price,
n.ratings AS ratings',
'MATCH (n)-[r:KNOWS|READ]->(m) RETURN id(n) AS source, id(m) AS target, type(r) AS type'
)
YIELD
graphName, nodeCount AS nodes, relationshipCount AS rels
RETURN graphName, nodes, rels
graphName | nodes | rels |
---|---|---|
"graphWithProperties" |
5 |
6 |
The projected graphWithProperties
graph contains five nodes and six relationships.
In a Legacy Cypher projection every node from the nodeQuery
gets the same node properties, which means you can’t have label-specific properties.
For instance in the example above the Person
nodes will also get ratings
and price
properties, while Book
nodes get the age
property.
Further, the price
property has a default value of 5.0
.
Not every book has a price specified in the example graph.
In the following we check if the price was correctly projected:
MATCH (n:Book)
RETURN n.name AS name, gds.util.nodeProperty('graphWithProperties', id(n), 'price') AS price
ORDER BY price
name | price |
---|---|
"The Hobbit" |
5.0 |
"Frankenstein" |
19.99 |
We can see, that the price was projected with the Hobbit having the default price of 5.0.
Relationship properties
Analogous to node properties, we can project relationship properties using the relationshipQuery
.
Person
and Book
nodes and READ
relationships with numberOfPages
property:CALL gds.graph.project.cypher(
'readWithProperties',
'MATCH (n) RETURN id(n) AS id, labels(n) AS labels',
'MATCH (n)-[r:READ]->(m)
RETURN id(n) AS source, id(m) AS target, type(r) AS type, r.numberOfPages AS numberOfPages'
)
YIELD
graphName AS graph, nodeCount AS nodes, relationshipCount AS rels
graph | nodes | rels |
---|---|---|
"readWithProperties" |
5 |
4 |
Next, we will verify that the relationship property numberOfPages
was correctly loaded.
numberOfPages
from the projected graph:CALL gds.graph.relationshipProperty.stream('readWithProperties', 'numberOfPages')
YIELD sourceNodeId, targetNodeId, propertyValue AS numberOfPages
RETURN
gds.util.asNode(sourceNodeId).name AS person,
gds.util.asNode(targetNodeId).name AS book,
numberOfPages
ORDER BY person ASC, numberOfPages DESC
person | book | numberOfPages |
---|---|---|
"Adam" |
"The Hobbit" |
30.0 |
"Florentin" |
"The Hobbit" |
42.0 |
"Florentin" |
"The Hobbit" |
4.0 |
"Veselin" |
"Frankenstein" |
NaN |
We can see, that the numberOfPages
are loaded. The default property value is Double.Nan
and can be changed as in the previous example Node properties by using the Cypher function coalesce().
Parallel relationships
The Property Graph Model in Neo4j supports parallel relationships, i.e., multiple relationships between two nodes. By default, GDS preserves the parallel relationships. For some algorithms, we want the projected graph to contain at most one relationship between two nodes.
The simplest way to achieve relationship deduplication is to use the DISTINCT
operator in the relationship query.
Alternatively, we can aggregate the parallel relationship by using the count() function and store the count as a relationship property.
Person
and Book
nodes and COUNT
aggregated READ
relationships:CALL gds.graph.project.cypher(
'readCount',
'MATCH (n) RETURN id(n) AS id, labels(n) AS labels',
'MATCH (n)-[r:READ]->(m)
RETURN id(n) AS source, id(m) AS target, type(r) AS type, count(r) AS numberOfReads'
)
YIELD
graphName AS graph, nodeCount AS nodes, relationshipCount AS rels
graph | nodes | rels |
---|---|---|
"readCount" |
5 |
3 |
Next, we will verify that the READ
relationships were correctly aggregated.
numberOfReads
of the projected graph:CALL gds.graph.relationshipProperty.stream('readCount', 'numberOfReads')
YIELD sourceNodeId, targetNodeId, propertyValue AS numberOfReads
RETURN
gds.util.asNode(sourceNodeId).name AS person,
gds.util.asNode(targetNodeId).name AS book,
numberOfReads
ORDER BY numberOfReads DESC, person
person | book | numberOfReads |
---|---|---|
"Florentin" |
"The Hobbit" |
2.0 |
"Adam" |
"The Hobbit" |
1.0 |
"Veselin" |
"Frankenstein" |
1.0 |
We can see, that the two READ relationships between Florentin and the Hobbit result in 2
numberOfReads.
Parallel relationships with properties
For graphs with relationship properties we can also use other aggregations documented in the Cypher Manual.
Person
and Book
nodes and aggregated READ
relationships by summing the numberOfPages
:CALL gds.graph.project.cypher(
'readSums',
'MATCH (n) RETURN id(n) AS id, labels(n) AS labels',
'MATCH (n)-[r:READ]->(m)
RETURN id(n) AS source, id(m) AS target, type(r) AS type, sum(r.numberOfPages) AS numberOfPages'
)
YIELD
graphName AS graph, nodeCount AS nodes, relationshipCount AS rels
graph | nodes | rels |
---|---|---|
"readSums" |
5 |
3 |
Next, we will verify that the relationship property numberOfPages
were correctly aggregated.
numberOfPages
of the projected graph:CALL gds.graph.relationshipProperty.stream('readSums', 'numberOfPages')
YIELD sourceNodeId, targetNodeId, propertyValue AS numberOfPages
RETURN
gds.util.asNode(sourceNodeId).name AS person,
gds.util.asNode(targetNodeId).name AS book,
numberOfPages
ORDER BY numberOfPages DESC, person
person | book | numberOfPages |
---|---|---|
"Florentin" |
"The Hobbit" |
46.0 |
"Adam" |
"The Hobbit" |
30.0 |
"Veselin" |
"Frankenstein" |
0.0 |
We can see, that the two READ
relationships between Florentin and the Hobbit sum up to 46
numberOfPages.
Projecting filtered Neo4j graphs
Cypher-projections allow us to specify the graph to project in a more fine-grained way.
The following examples will demonstrate how we to filter out READ
relationship if they do not have a numberOfPages
property.
Person
and Book
nodes and READ
relationships where numberOfPages
is present:CALL gds.graph.project.cypher(
'existingNumberOfPages',
'MATCH (n) RETURN id(n) AS id, labels(n) AS labels',
'MATCH (n)-[r:READ]->(m)
WHERE r.numberOfPages IS NOT NULL
RETURN id(n) AS source, id(m) AS target, type(r) AS type, r.numberOfPages AS numberOfPages'
)
YIELD
graphName AS graph, nodeCount AS nodes, relationshipCount AS rels
graph | nodes | rels |
---|---|---|
"existingNumberOfPages" |
5 |
3 |
Next, we will verify that the relationship property numberOfPages
was correctly loaded.
numberOfPages
from the projected graph:CALL gds.graph.relationshipProperty.stream('existingNumberOfPages', 'numberOfPages')
YIELD sourceNodeId, targetNodeId, propertyValue AS numberOfPages
RETURN
gds.util.asNode(sourceNodeId).name AS person,
gds.util.asNode(targetNodeId).name AS book,
numberOfPages
ORDER BY person ASC, numberOfPages DESC
person | book | numberOfPages |
---|---|---|
"Adam" |
"The Hobbit" |
30.0 |
"Florentin" |
"The Hobbit" |
42.0 |
"Florentin" |
"The Hobbit" |
4.0 |
If we compare the results to the ones from Relationship properties, we can see that using IS NOT NULL
is filtering out the relationship from Veselin to the book Frankenstein.
This functionality is only expressible with native projections by projecting a subgraph.
Using query parameters
Similar to Cypher, it is also possible to set query parameters. In the following example we supply a list of strings to limit the cities we want to project.
Person
and Book
nodes and READ
relationships where numberOfPages
is greater than 9:CALL gds.graph.project.cypher(
'existingNumberOfPages',
'MATCH (n) RETURN id(n) AS id, labels(n) AS labels',
'MATCH (n)-[r:READ]->(m)
WHERE r.numberOfPages > $minNumberOfPages
RETURN id(n) AS source, id(m) AS target, type(r) AS type, r.numberOfPages AS numberOfPages',
{ parameters: { minNumberOfPages: 9} }
)
YIELD
graphName AS graph, nodeCount AS nodes, relationshipCount AS rels
graph | nodes | rels |
---|---|---|
"existingNumberOfPages" |
5 |
2 |
Further usage of parameters
The parameters can also be used to directly pass in a list of nodes or a list of relationships. For example, pre-computing the list of nodes can be useful if the node filter is expensive.
Person
nodes younger than 17 and their name not beginning with V, and KNOWS
relationships:CALL gds.graph.project.cypher(
'personSubset',
'MATCH (n)
WHERE n.age < 20 AND NOT n.name STARTS WITH "V"
RETURN id(n) AS id, labels(n) AS labels',
'MATCH (n)-[r:KNOWS]->(m)
WHERE (n.age < 20 AND NOT n.name STARTS WITH "V") AND
(m.age < 20 AND NOT m.name STARTS WITH "V")
RETURN id(n) AS source, id(m) AS target, type(r) AS type, r.numberOfPages AS numberOfPages'
)
YIELD
graphName, nodeCount AS nodes, relationshipCount AS rels
graphName | nodes | rels |
---|---|---|
"personSubset" |
2 |
1 |
By passing the relevant Persons as a parameter, the above query can be transformed into the following:
Person
nodes younger than 20 and their name not beginning with V, and KNOWS
relationships by using parameters:MATCH (n)
WHERE n.age < 20 AND NOT n.name STARTS WITH "V"
WITH collect(n) AS olderPersons
CALL gds.graph.project.cypher(
'personSubsetViaParameters',
'UNWIND $nodes AS n RETURN id(n) AS id, labels(n) AS labels',
'MATCH (n)-[r:KNOWS]->(m)
WHERE (n IN $nodes) AND (m IN $nodes)
RETURN id(n) AS source, id(m) AS target, type(r) AS type, r.numberOfPages AS numberOfPages',
{ parameters: { nodes: olderPersons} }
)
YIELD
graphName, nodeCount AS nodes, relationshipCount AS rels
RETURN graphName, nodes, rels
graphName | nodes | rels |
---|---|---|
"personSubsetViaParameters" |
2 |
1 |