Basic queries

This page contains information about how to create, query, and delete a graph database using Cypher®. For more advanced queries, see the section on Subqueries.

The examples below uses the publicly available Neo4j movie database.

Creating a data model

Before creating a property graph database, it is important to develop an appropriate data model. This will provide structure to the data, and allow users of the graph to efficiently retrieve the information they are looking for.

The following data model is used for the Neo4j data model:

introduction schema

It includes two types of node labels:

  • Person nodes, which have the following properties: name and born.

  • Movie nodes, which have the following properties: title, released, and tagline.

The data model also contains five different relationship types between the Person and Movie nodes: ACTED_IN, DIRECTED, PRODUCED, WROTE, and REVIEWED. Two of the relationship types have properties:

  • The ACTED_IN relationship type, which has the roles property.

  • The REVIEWED relationship type, which has a summary property and a rating property.

To learn more about data modelling for graph databases, enroll in the free Graph Data Modelling Fundamentals course offered by GraphAcademy.

Creating a property graph database

The complete Cypher query to create the Neo4j movie database, can be found here. To create the full graph, run the full query against an empty Neo4j database.

Finding nodes

The MATCH clause is used to find a specific pattern in the graph, such as a specific node. The RETURN clause specifies what of the found graph pattern to return.

For example, this query will find the nodes with Person label and the name Keanu Reeves, and return the name and born properties of the found nodes:

Query
MATCH (keanu:Person {name:'Keanu Reeves'})
RETURN keanu.name AS name, keanu.born AS born
Table 1. Result
name born

"Keanu Reeves"

1964

Rows: 1

It is also possible to query a graph for several nodes. This query matches all nodes with the Person label, and limits the results to only include five rows.

Query
MATCH (people:Person)
RETURN people
LIMIT 5
Table 2. Result
people

{"born":1964,"name":"Keanu Reeves"}

{"born":1967,"name":"Carrie-Anne Moss"}

{"born":1961,"name":"Laurence Fishburne"}

{"born":1960,"name":"Hugo Weaving"}

{"born":1967,"name":"Lilly Wachowski"}

Rows: 5

Note on clause composition

Similar to SQL, Cypher queries are constructed using various clauses which are chained together to feed intermediate results between each other. Each clause has as input the state of the graph and a table of intermediate results consisting of the referenced variables. The first clause takes as input the state of the graph before the query and an empty table of intermediate results. The output of a clause is a new state of the graph and a new table of intermediate results, serving as input to the next clause. The output of the last clause is the result of the query.

Note that if one of the clauses returns an empty table of intermediate results, there is nothing to pass on to subsequent clauses, thus ending the query. (There are ways to circumvent this behaviour. For example, by replacing a MATCH clause with OPTIONAL MATCH.)

In the below example, the first MATCH clause finds all nodes with the Person label. The second clause will then filter those nodes to find all Person nodes who were born in the 1980s. The final clause returns the result in a descending chronological order.

Query
MATCH (bornInEighties:Person)
WHERE bornInEighties.born >= 1980 AND bornInEighties.born < 1990
RETURN bornInEighties.name as name, bornInEighties.born as born
ORDER BY born DESC
Table 3. Result
name born

"Emile Hirsch"

1985

"Rain"

1982

"Natalie Portman"

1981

"Christina Ricci"

1980

Rows: 4

For more details, see the section on Clause composition.

Finding connected nodes

To discover how nodes are connected to one another, relationships must be added to queries. Queries can specify relationship types, properties, and direction, as well as the start and end nodes of the pattern.

For example, the following query matches the graph for the director of the movie the Matrix, and returns the name property of its directors.

Query
MATCH (m:Movie {title: 'The Matrix'})<-[d:DIRECTED]-(p:Person)
RETURN p.name as director
Table 4. Result
director

"Lilly Wachowski"

"Lana Wachowski"

Rows: 2

It also possible to look for the type of relationships that connect nodes to one another. The below query searches the graph for outgoing relationships from the Tom Hanks node to any Movie nodes, and returns the relationships and the titles of the movies connected to him.

Query
MATCH (tom:Person {name:'Tom Hanks'})-[r]->(m:Movie)
RETURN type(r) AS type, m.title AS movie

The result shows that he has 13 outgoing relationships connected to 12 different Movie nodes (12 have the ACTED_IN type and one has the DIRECTED type).

introduction example1
Table 5. Result
type movie

"ACTED_IN"

"Apollo 13"

"ACTED_IN"

"You’ve Got Mail"

"ACTED_IN"

"A League of Their Own"

"ACTED_IN"

"That Thing You Do"

"ACTED_IN"

"The Da Vinci Code"

"ACTED_IN"

"Cloud Atlas"

"ACTED_IN"

"Joe versus the Volcano"

"ACTED_IN"

"Cast Away"

"ACTED_IN"

"The Green Mile"

"ACTED_IN"

"Sleepless in Seattle"

"ACTED_IN"

"The Polar Express"

"ACTED_IN"

"Charlie Wilson’s War"

"DIRECTED"

"That Thing You Do"

Rows: 13

It is possible to further modify Cypher queries by adding label expressions to the clauses. For example, the below query uses a NOT label expression (!) to return all relationships connected to Tom Hanks that are not of type ACTED_IN.

Query
MATCH (:Person {name:'Tom Hanks'})-[r:!ACTED_IN]->(m:Movie)
Return type(r) AS type, m.title AS movies
Table 6. Result
type movie

"DIRECTED"

"That Thing You Do"

Rows: 1

For more information about the different label expressions supported by Cypher, see the section on label expressions.

Finding paths

There are several ways in which Cypher can be used to search a graph for paths between nodes.

To search for patterns of a fixed length, specify the distance (hops) between the nodes in the pattern by using a quantifier ({n}). For example, the following query matches all Person nodes exactly 2 hops away from Tom Hanks and returns the first five rows. The DISTINCT operator ensures that the result contain no duplicate values.

Query
MATCH (tom:Person {name:'Tom Hanks'})--{2}(colleagues:Person)
RETURN DISTINCT colleagues.name AS name, colleagues.born AS bornIn
ORDER BY bornIn
LIMIT 5
Table 7. Result
name bornIn

"Mike Nichols"

1931

"Ian McKellen"

1939

"James Cromwell"

1940

"Nora Ephron"

1941

"Penny Marshall"

1943

Rows: 5

It is also possible to match a graph for patterns of a variable length. The below query matches all Person nodes between 1 and 4 hops away from Tom Hanks and returns the first five rows.

Query
MATCH (p:Person {name:'Tom Hanks'})--{1,4}(colleagues:Person)
RETURN DISTINCT colleagues.name AS name, colleagues.born AS bornIn
ORDER BY bornIn, name
LIMIT 5
Table 8. Result
name bornIn

"Max von Sydow"

1929

"Clint Eastwood"

1930

"Gene Hackman"

1930

"Richard Harris"

1930

"Mike Nichols"

1931

Rows: 5

The quantifier used in the above two examples was introduced with the release of quantified path patterns in Neo4j 5.9. Before that, the only way in Cypher to match paths of a variable length was with a variable-length relationship. This syntax is still available in Cypher, but it is not GQL conformant. For more information, see Patterns → Syntax and semantics → Variable-length relationships.

The SHORTEST keyword can be used to find a variation of the shortest paths between two nodes. In this example, ALL SHORTEST paths between the two nodes Keanu Reeves and Tom Cruise are found. The count() function calculates the number of these shortest paths while the length() function calculates the length of each path in terms of traversed relationships.

Query
MATCH p = ALL SHORTEST (:Person {name:"Keanu Reeves"})--+(:Person {name:"Tom Cruise"})
RETURN count(p) AS pathCount, length(p) AS pathLength

The results show that 2 different paths are tied for the shortest length.

Table 9. Result
pathCount pathLength

2

4

Rows: 1

The SHORTEST keyword was introduced in Neo4j 5.21, and functionally replaces and extendes the shortestPath() and allShortestPaths() functions. Both functions can still be used, but they are not GQL conformant. For more information, see Patterns → Syntax and semantics → The shortestPath() and allShortestPaths() functions.

For more information about graph pattern matching, see Patterns.

Finding recommendations

Cypher allows for more complex queries. The following query tries to recommend co-actors for Keanu Reeves, who he has yet to work with but who his co-actors have worked with. The query then orders the results by how frequently a matched co-co-actor has collaborated with one of Keanu Reeves' co-actors.

Query
MATCH (keanu:Person {name:'Keanu Reeves'})-[:ACTED_IN]->(m:Movie)<-[:ACTED_IN]-(coActors:Person),
  (coActors:Person)-[:ACTED_IN]->(m2:Movie)<-[:ACTED_IN]-(cocoActors:Person)
WHERE NOT (keanu)-[:ACTED_IN]->()<-[:ACTED_IN]-(cocoActors) AND keanu <> cocoActors
RETURN cocoActors.name AS recommended, count(cocoActors) AS strength
ORDER BY strength DESC
LIMIT 7
Table 10. Result
recommended strength

"Tom Hanks"

4

"John Hurt"

3

"Jim Broadbent"

3

"Halle Berry"

3

"Stephen Rea"

3

"Natalie Portman"

3

"Ben Miles"

3

Rows: 5

There are several connections between the Keanu Reeves and Tom Hanks nodes in the movie database, but the two have never worked together in a film. The following query matches coactors who could introduce the two, by looking for co-actors who have worked with both of them in separate movies:

Query
MATCH (:Person {name: 'Keanu Reeves'})-[:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(coActor:Person),
  (coActor)-[:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(:Person {name:'Tom Hanks'})
RETURN DISTINCT coActor.name AS coActor
Table 11. Result
coActor

"Charlize Theron"

"Hugo Weaving"

Rows: 2

Delete a graph

To delete all nodes and relationships in a graph, run the following query:

MATCH (n)
DETACH DELETE n
DETACH DELETE is not suitable for deleting large amounts of data, nor does it delete indexes and constraints. For more information, and alternatives to DETACH DELETE, see DELETE → Delete all nodes and relationships.