Cypher projection (deprecated)

This page describes the Legacy Cypher projection, which is deprecated. The replacement is to use the new Cypher projection, which is described in Projecting graphs using Cypher. A migration guide is available at Appendix C, Migration from Legacy to new Cypher projection.

Legacy Cypher projections are a more flexible and expressive approach compared to native projections. A Legacy Cypher projection uses Cypher to create (project) an in-memory graph from the Neo4j database.

Considerations

Lifecycle

The projected graphs will reside in the catalog until either:

the graph is dropped using gds.graph.drop
the Neo4j database from which the graph was projected is stopped or dropped
the Neo4j database management system is stopped.

Node property support

Legacy Cypher projections can only project a limited set of node property types from a Cypher query. The Node Properties page details which node property types are supported. Other types of node properties have to be transformed or encoded into one of the supported types in order to be projected using a Legacy Cypher projection.

Syntax

A Legacy Cypher projection takes three mandatory arguments: graphName, nodeQuery and relationshipQuery. In addition, the optional configuration parameter allows us to further configure graph creation.

CALL gds.graph.project.cypher(
    graphName: String,
    nodeQuery: String,
    relationshipQuery: String,
    configuration: Map
) YIELD
    graphName: String,
    nodeQuery: String,
    nodeCount: Integer,
    relationshipQuery: String,
    relationshipCount: Integer,
    projectMillis: Integer

Table 1. Parameters
Name	Optional	Description
graphName	no	The name under which the graph is stored in the catalog.
nodeQuery	no	Cypher query to project nodes. The query result must contain an `id` column. Optionally, a `labels` column can be specified to represent node labels. Additional columns are interpreted as properties.
relationshipQuery	no	Cypher query to project relationships. The query result must contain `source` and `target` columns. Optionally, a `type` column can be specified to represent relationship type. Additional columns are interpreted as properties.
configuration	yes	Additional parameters to configure the Legacy Cypher projection.

Table 2. Configuration
Name	Type	Default	Description
readConcurrency	Integer	4	The number of concurrent threads used for creating the graph.
validateRelationships	Boolean	true	Whether to throw an error if the `relationshipQuery` returns relationships between nodes not returned by the `nodeQuery`.
parameters	Map	{}	A map of user-defined query parameters that are passed into the node and relationship queries.
jobId	String	Generated internally	An ID that can be provided to more easily track the projection’s progress.

Table 3. Results
Name	Type	Description
graphName	String	The name under which the graph is stored in the catalog.
nodeQuery	String	The Cypher query used to project the nodes in the graph.
nodeCount	Integer	The number of nodes stored in the projected graph.
relationshipQuery	String	The Cypher query used to project the relationships in the graph.
relationshipCount	Integer	The number of relationships stored in the projected graph.
projectMillis	Integer	Milliseconds for projecting the graph.

To get information about a stored graph, such as its schema, one can use gds.graph.list.

Examples

All the examples below should be run in an empty database.

In order to demonstrate the GDS Graph Project capabilities we are going to create a small social network graph in Neo4j. The example graph looks like this:

The following Cypher statement will create the example graph in the Neo4j database:

CREATE
  (florentin:Person { name: 'Florentin', age: 16 }),
  (adam:Person { name: 'Adam', age: 18 }),
  (veselin:Person { name: 'Veselin', age: 20, ratings: [5.0] }),
  (hobbit:Book { name: 'The Hobbit', isbn: 1234, numberOfPages: 310, ratings: [1.0, 2.0, 3.0, 4.5] }),
  (frankenstein:Book { name: 'Frankenstein', isbn: 4242, price: 19.99 }),

  (florentin)-[:KNOWS { since: 2010 }]->(adam),
  (florentin)-[:KNOWS { since: 2018 }]->(veselin),
  (florentin)-[:READ { numberOfPages: 4 }]->(hobbit),
  (florentin)-[:READ { numberOfPages: 42 }]->(hobbit),
  (adam)-[:READ { numberOfPages: 30 }]->(hobbit),
  (veselin)-[:READ]->(frankenstein)

Simple graph

A simple graph is a graph with only one node label and relationship type, i.e., a monopartite graph. We are going to start with demonstrating how to load a simple graph by projecting only the Person node label and KNOWS relationship type.

Project Person nodes and KNOWS relationships:

CALL gds.graph.project.cypher(
  'persons',
  'MATCH (n:Person) RETURN id(n) AS id',
  'MATCH (n:Person)-[r:KNOWS]->(m:Person) RETURN id(n) AS source, id(m) AS target')
YIELD
  graphName AS graph, nodeQuery, nodeCount AS nodes, relationshipQuery, relationshipCount AS rels

Table 4. Results
graph	nodeQuery	nodes	relationshipQuery	rels
"persons"	`"MATCH (n:Person) RETURN id(n) AS id"`	3	"MATCH (n:Person)-[r:KNOWS]→(m:Person) RETURN id(n) AS source, id(m) AS target"	`2`

Multi-graph

A multi-graph is a graph with multiple node labels and relationship types.

To retain the label and type information when we load multiple node labels and relationship types, we can add a labels column to the node query and a type column to the relationship query.

Project Person and Book nodes and KNOWS and READ relationships:

CALL gds.graph.project.cypher(
  'personsAndBooks',
  'MATCH (n) WHERE n:Person OR n:Book RETURN id(n) AS id, labels(n) AS labels',
  'MATCH (n)-[r:KNOWS|READ]->(m) RETURN id(n) AS source, id(m) AS target, type(r) AS type')
YIELD
  graphName AS graph, nodeQuery, nodeCount AS nodes, relationshipCount AS rels

Table 5. Results
graph	nodeQuery	nodes	rels
"personsAndBooks"	`"MATCH (n) WHERE n:Person OR n:Book RETURN id(n) AS id, labels(n) AS labels"`	`5`	`6`

Relationship orientation

The native projection supports specifying an orientation per relationship type. The Legacy Cypher projection treats every relationship returned by the relationship query as if it were in NATURAL orientation and creates a directed relationship from the first provided id (source) to the second (target). Projecting in REVERSE orientation can be achieved by switching the order of ids in the RETURN clause such as MATCH (n)-[r:KNOWS]→(m) RETURN id(m) AS source, id(n) AS target, type(r) AS type.

It not possible to project graphs in UNDIRECTED orientation when Legacy Cypher projections are used.

Some algorithms require that the graph was loaded with UNDIRECTED orientation. These algorithms can not be used with a graph projected by a Legacy Cypher projection.

Node properties

To load node properties, we add a column to the result of the node query for each property. Thereby, we use the Cypher function coalesce() function to specify the default value, if the node does not have the property.

Project Person and Book nodes and KNOWS and READ relationships:

CALL gds.graph.project.cypher(
  'graphWithProperties',
  'MATCH (n)
   WHERE n:Book OR n:Person
   RETURN
    id(n) AS id,
    labels(n) AS labels,
    coalesce(n.age, 18) AS age,
    coalesce(n.price, 5.0) AS price,
    n.ratings AS ratings',
  'MATCH (n)-[r:KNOWS|READ]->(m) RETURN id(n) AS source, id(m) AS target, type(r) AS type'
)
YIELD
  graphName, nodeCount AS nodes, relationshipCount AS rels
RETURN graphName, nodes, rels

Table 6. Results
graphName	nodes	rels
"graphWithProperties"	5	6

The projected graphWithProperties graph contains five nodes and six relationships. In a Legacy Cypher projection every node from the nodeQuery gets the same node properties, which means you can’t have label-specific properties. For instance in the example above the Person nodes will also get ratings and price properties, while Book nodes get the age property.

Further, the price property has a default value of 5.0. Not every book has a price specified in the example graph. In the following we check if the price was correctly projected:

Verify the ratings property of Adam in the projected graph:

MATCH (n:Book)
RETURN n.name AS name, gds.util.nodeProperty('graphWithProperties', id(n), 'price') AS price
ORDER BY price

Table 7. Results
name	price
"The Hobbit"	5.0
"Frankenstein"	19.99

We can see, that the price was projected with the Hobbit having the default price of 5.0.

Relationship properties

Analogous to node properties, we can project relationship properties using the relationshipQuery.

Project Person and Book nodes and READ relationships with numberOfPages property:

CALL gds.graph.project.cypher(
  'readWithProperties',
  'MATCH (n) RETURN id(n) AS id, labels(n) AS labels',
  'MATCH (n)-[r:READ]->(m)
    RETURN id(n) AS source, id(m) AS target, type(r) AS type, r.numberOfPages AS numberOfPages'
)
YIELD
  graphName AS graph, nodeCount AS nodes, relationshipCount AS rels

Table 8. Results
graph	nodes	rels
"readWithProperties"	5	4

Next, we will verify that the relationship property numberOfPages was correctly loaded.

Stream the relationship property numberOfPages from the projected graph:

CALL gds.graph.relationshipProperty.stream('readWithProperties', 'numberOfPages')
YIELD sourceNodeId, targetNodeId, propertyValue AS numberOfPages
RETURN
  gds.util.asNode(sourceNodeId).name AS person,
  gds.util.asNode(targetNodeId).name AS book,
  numberOfPages
ORDER BY person ASC, numberOfPages DESC

Table 9. Results
person	book	numberOfPages
"Adam"	"The Hobbit"	30.0
"Florentin"	"The Hobbit"	42.0
"Florentin"	"The Hobbit"	4.0
"Veselin"	"Frankenstein"	NaN

We can see, that the numberOfPages are loaded. The default property value is Double.Nan and can be changed as in the previous example Node properties by using the Cypher function coalesce().

Parallel relationships

The Property Graph Model in Neo4j supports parallel relationships, i.e., multiple relationships between two nodes. By default, GDS preserves the parallel relationships. For some algorithms, we want the projected graph to contain at most one relationship between two nodes.

The simplest way to achieve relationship deduplication is to use the DISTINCT operator in the relationship query. Alternatively, we can aggregate the parallel relationship by using the count() function and store the count as a relationship property.

Project Person and Book nodes and COUNT aggregated READ relationships:

CALL gds.graph.project.cypher(
  'readCount',
  'MATCH (n) RETURN id(n) AS id, labels(n) AS labels',
  'MATCH (n)-[r:READ]->(m)
    RETURN id(n) AS source, id(m) AS target, type(r) AS type, count(r) AS numberOfReads'
)
YIELD
  graphName AS graph, nodeCount AS nodes, relationshipCount AS rels

Table 10. Results
graph	nodes	rels
"readCount"	5	3

Next, we will verify that the READ relationships were correctly aggregated.

Stream the relationship property numberOfReads of the projected graph:

CALL gds.graph.relationshipProperty.stream('readCount', 'numberOfReads')
YIELD sourceNodeId, targetNodeId, propertyValue AS numberOfReads
RETURN
  gds.util.asNode(sourceNodeId).name AS person,
  gds.util.asNode(targetNodeId).name AS book,
  numberOfReads
ORDER BY numberOfReads DESC, person

Table 11. Results
person	book	numberOfReads
"Florentin"	"The Hobbit"	2.0
"Adam"	"The Hobbit"	1.0
"Veselin"	"Frankenstein"	1.0

We can see, that the two READ relationships between Florentin and the Hobbit result in 2 numberOfReads.

Parallel relationships with properties

For graphs with relationship properties we can also use other aggregations documented in the Cypher Manual.

Project Person and Book nodes and aggregated READ relationships by summing the numberOfPages:

CALL gds.graph.project.cypher(
  'readSums',
  'MATCH (n) RETURN id(n) AS id, labels(n) AS labels',
  'MATCH (n)-[r:READ]->(m)
    RETURN id(n) AS source, id(m) AS target, type(r) AS type, sum(r.numberOfPages) AS numberOfPages'
)
YIELD
  graphName AS graph, nodeCount AS nodes, relationshipCount AS rels

Table 12. Results
graph	nodes	rels
"readSums"	5	3

Next, we will verify that the relationship property numberOfPages were correctly aggregated.

Stream the relationship property numberOfPages of the projected graph:

CALL gds.graph.relationshipProperty.stream('readSums', 'numberOfPages')
YIELD sourceNodeId, targetNodeId, propertyValue AS numberOfPages
RETURN
  gds.util.asNode(sourceNodeId).name AS person,
  gds.util.asNode(targetNodeId).name AS book,
  numberOfPages
ORDER BY numberOfPages DESC, person

Table 13. Results
person	book	numberOfPages
"Florentin"	"The Hobbit"	46.0
"Adam"	"The Hobbit"	30.0
"Veselin"	"Frankenstein"	0.0

We can see, that the two READ relationships between Florentin and the Hobbit sum up to 46 numberOfPages.

Projecting filtered Neo4j graphs

Cypher-projections allow us to specify the graph to project in a more fine-grained way. The following examples will demonstrate how we to filter out READ relationship if they do not have a numberOfPages property.

Project Person and Book nodes and READ relationships where numberOfPages is present:

CALL gds.graph.project.cypher(
  'existingNumberOfPages',
  'MATCH (n) RETURN id(n) AS id, labels(n) AS labels',
  'MATCH (n)-[r:READ]->(m)
    WHERE r.numberOfPages IS NOT NULL
    RETURN id(n) AS source, id(m) AS target, type(r) AS type, r.numberOfPages AS numberOfPages'
)
YIELD
  graphName AS graph, nodeCount AS nodes, relationshipCount AS rels

Table 14. Results
graph	nodes	rels
"existingNumberOfPages"	5	3

Next, we will verify that the relationship property numberOfPages was correctly loaded.

Stream the relationship property numberOfPages from the projected graph:

CALL gds.graph.relationshipProperty.stream('existingNumberOfPages', 'numberOfPages')
YIELD sourceNodeId, targetNodeId, propertyValue AS numberOfPages
RETURN
  gds.util.asNode(sourceNodeId).name AS person,
  gds.util.asNode(targetNodeId).name AS book,
  numberOfPages
ORDER BY person ASC, numberOfPages DESC

Table 15. Results
person	book	numberOfPages
"Adam"	"The Hobbit"	30.0
"Florentin"	"The Hobbit"	42.0
"Florentin"	"The Hobbit"	4.0

If we compare the results to the ones from Relationship properties, we can see that using IS NOT NULL is filtering out the relationship from Veselin to the book Frankenstein. This functionality is only expressible with native projections by projecting a subgraph.

Using query parameters

Similar to Cypher, it is also possible to set query parameters. In the following example we supply a list of strings to limit the cities we want to project.

Project Person and Book nodes and READ relationships where numberOfPages is greater than 9:

CALL gds.graph.project.cypher(
  'existingNumberOfPages',
  'MATCH (n) RETURN id(n) AS id, labels(n) AS labels',
  'MATCH (n)-[r:READ]->(m)
    WHERE r.numberOfPages > $minNumberOfPages
    RETURN id(n) AS source, id(m) AS target, type(r) AS type, r.numberOfPages AS numberOfPages',
  { parameters: { minNumberOfPages: 9} }
)
YIELD
  graphName AS graph, nodeCount AS nodes, relationshipCount AS rels

Table 16. Results
graph	nodes	rels
"existingNumberOfPages"	5	2

Further usage of parameters

The parameters can also be used to directly pass in a list of nodes or a list of relationships. For example, pre-computing the list of nodes can be useful if the node filter is expensive.

Project Person nodes younger than 17 and their name not beginning with V, and KNOWS relationships:

CALL gds.graph.project.cypher(
  'personSubset',
  'MATCH (n)
    WHERE n.age < 20 AND NOT n.name STARTS WITH "V"
    RETURN id(n) AS id, labels(n) AS labels',
  'MATCH (n)-[r:KNOWS]->(m)
    WHERE (n.age < 20 AND NOT n.name STARTS WITH "V") AND
          (m.age < 20 AND NOT m.name STARTS WITH "V")
    RETURN id(n) AS source, id(m) AS target, type(r) AS type, r.numberOfPages AS numberOfPages'
)
YIELD
  graphName, nodeCount AS nodes, relationshipCount AS rels

Table 17. Results
graphName	nodes	rels
"personSubset"	2	1

By passing the relevant Persons as a parameter, the above query can be transformed into the following:

Project Person nodes younger than 20 and their name not beginning with V, and KNOWS relationships by using parameters:

MATCH (n)
WHERE n.age < 20 AND NOT n.name STARTS WITH "V"
WITH collect(n) AS olderPersons
CALL gds.graph.project.cypher(
  'personSubsetViaParameters',
  'UNWIND $nodes AS n RETURN id(n) AS id, labels(n) AS labels',
  'MATCH (n)-[r:KNOWS]->(m)
    WHERE (n IN $nodes) AND (m IN $nodes)
    RETURN id(n) AS source, id(m) AS target, type(r) AS type, r.numberOfPages AS numberOfPages',
  { parameters: { nodes: olderPersons} }
)
 YIELD
  graphName, nodeCount AS nodes, relationshipCount AS rels
 RETURN graphName, nodes, rels

Table 18. Results
graphName	nodes	rels
"personSubsetViaParameters"	2	1