Patterns in practice
Creating and returning data
Let’s start by looking into the clauses that allow you to create data.
To add data, you just use the patterns you already know. By providing patterns, you can specify what graph structures, labels, and properties you would like to make part of your graph.
Obviously the simplest clause is called CREATE
.
It creates the patterns that you specify.
For the patterns you have looked at so far this could look like the following:
CREATE (:Movie {title: 'The Matrix', released: 1997})
If you run this statement, Cypher® returns the number of changes: in this case adding one node, one label, and two properties.
Created Nodes: 1 Added Labels: 1 Set Properties: 2 Rows: 0
As you started out with an empty database, you now have a database with a single node in it:
If you also want to return the created data, you can add a RETURN
clause, which refers to the variable you have assigned to your pattern elements.
The RETURN
keyword in Cypher specifies what values or results you might want to return from a Cypher query.
You can tell Cypher to return nodes, relationships, node and relationship properties, or patterns in your query results.
RETURN
is not required when doing write procedures, but is needed for reads.
The node and relationship variables, which are discussed earlier, become important when using RETURN
.
CREATE (p:Person {name: 'Keanu Reeves', born: 1964})
RETURN p
This is what gets returned:
Created Nodes: 1 Added Labels: 1 Set Properties: 2 Rows: 1 +----------------------------------------------+ | p | +----------------------------------------------+ | (:Person {name: 'Keanu Reeves', born: 1964}) | +----------------------------------------------+
If you want to create more than one element, you can separate the elements with commas or use multiple CREATE
statements.
You can, of course, also create more complex structures, like an ACTED_IN
relationship with information about the character, or DIRECTED
ones for the director.
CREATE (a:Person {name: 'Tom Hanks', born: 1956})-[r:ACTED_IN {roles: ['Forrest']}]->(m:Movie {title: 'Forrest Gump', released: 1994})
CREATE (d:Person {name: 'Robert Zemeckis', born: 1951})-[:DIRECTED]->(m)
RETURN a, d, r, m
This is the part of the updated graph:
In most cases, you want to add new data to existing structures. This requires knowing how to find existing patterns in your graph data, which is covered in the next section.
Matching patterns
Matching patterns is a task for the MATCH
statement.
You pass the same kind of patterns you have used so far to MATCH
to describe what you are looking for.
It is similar to query by example, only that your examples also include the structures.
To bring back nodes, relationships, properties, or patterns, you need to have variables specified in your MATCH
clause for the data you want to return.
A |
To find the data you have created so far, you can start looking for all nodes labeled with the Movie
label.
MATCH (m:Movie)
RETURN m
Here’s the result:
This should show both The Matrix and Forrest Gump.
You can also look for a specific person, like Keanu Reeves.
MATCH (p:Person {name: 'Keanu Reeves'})
RETURN p
This query returns the matching node:
Note that you only provide enough information to find the nodes, not all properties are required. In most cases, you have key-properties like SSN, ISBN, emails, logins, geolocation, or product codes to look for.
You can also find more interesting connections, like, for instance, the movies' titles that Tom Hanks acted in and roles he played.
MATCH (p:Person {name: 'Tom Hanks'})-[r:ACTED_IN]->(m:Movie)
RETURN m.title, r.roles
Rows: 1 +------------------------------+ | m.title | r.roles | +------------------------------+ | 'Forrest Gump' | ['Forrest'] | +------------------------------+
In this case, you only returned the properties of the nodes and relationships that you are interested in.
You can access them everywhere via a dot notation identifer.property
.
Of course, this only lists T. Hank’s role as Forrest in Forrest Gump because that’s all data that you have added.
Now you know enough to add new nodes to existing ones and can combine MATCH
and CREATE
to attach structures to the graph.
Cypher examples
Let us look at some examples of using MATCH
and RETURN
keywords.
Each subsequent example is more complicated than the previous one. The two last examples start with explanations of what we are trying to achieve. If you click the Run Query button below each Cypher code snippet, you can see the results in the format of a graph or table.
-
Example 1: Find the labeled
Person
nodes in the graph. Note that you must use a variable likep
for thePerson
node if you want to retrieve the node in theRETURN
clause.
MATCH (p:Person)
RETURN p
LIMIT 1
-
Example 2: Find
Person
nodes in the graph that have a name of 'Tom Hanks'. Remember that you can name your variable anything you want, as long as you reference that same name later.
MATCH (tom:Person {name: 'Tom Hanks'})
RETURN tom
-
Example 3: Find which
Movie
Tom Hanks has directed.
Explanation: at first you should find Tom Hanks' Person
node and after that the Movie
nodes he is connected to.
To do that, you have to follow the DIRECTED
relationship from Tom Hanks' Person
node to the Movie
node.
You have also specified a label of Movie
so that the query only looks at nodes with that label.
Since you only care about returning the movie in this query, you need to give that node a variable (movie
) but do not need to give variables for the Person
node or DIRECTED
relationship.
MATCH (:Person {name: 'Tom Hanks'})-[:DIRECTED]->(movie:Movie)
RETURN movie
-
Example 4: Find which
Movie
Tom Hanks has directed, but this time, return only the title of the movie.
Explanation: this query is similar to the previous one.
Example 3 returned the entire Movie
node with all its properties.
For this example, you still need to find Tom’s movies, but now you only care about their titles.
You should access the node’s title
property using the syntax variable.property
to return the name value.
MATCH (:Person {name: 'Tom Hanks'})-[:DIRECTED]->(movie:Movie)
RETURN movie.title
Aliasing return values
Not all properties are simple like movie.title
in the example above.
Some properties have poor names due to property length, multi-word descriptions, developer jargon, and other shortcuts.
These naming conventions can be difficult to read, especially if they end up on reports and other user-facing interfaces.
//poorly-named property
MATCH (tom:Person {name:'Tom Hanks'})-[rel:DIRECTED]-(movie:Movie)
RETURN tom.name, tom.born, movie.title, movie.released
Just like with SQL, you can rename return results by using the AS
keyword and aliasing the property with a cleaner name.
//cleaner printed results with aliasing
MATCH (tom:Person {name:'Tom Hanks'})-[rel:DIRECTED]-(movie:Movie)
RETURN tom.name AS name, tom.born AS `Year Born`, movie.title AS title, movie.released AS `Year Released`
You can specify return aliases that have spaces by using the backtick character before and after the alias (movie.released AS |
Attaching structures
To extend the graph with new information, you first match the existing connection points and then attach the newly created nodes to them with relationships. Adding Cloud Atlas as a new movie for Tom Hanks could be achieved like this:
MATCH (p:Person {name: 'Tom Hanks'})
CREATE (m:Movie {title: 'Cloud Atlas', released: 2012})
CREATE (p)-[r:ACTED_IN {roles: ['Zachry']}]->(m)
RETURN p, r, m
Here’s what the structure looks like in the database:
It is important to remember that you can assign variables to both nodes and relationships and use them later on, no matter if they were created or matched. |
It is possible to attach both node and relationship in a single CREATE
clause.
For readability, it helps to split them up though.
A tricky aspect of the combination of |
Completing patterns
Whenever you get data from external systems or are not sure if certain information already exists in the graph, you want to be able to express a repeatable (idempotent) update operation.
In Cypher MERGE
clause has this function.
It acts like a combination of MATCH
or CREATE
, which checks for the existence of data before creating it.
With MERGE
, you define a pattern to be found or created.
Usually, as with MATCH
, you only want to include the key property to look for in your core pattern.
MERGE
allows you to provide additional properties you want to set ON CREATE
.
If you do not know whether your graph already contained Cloud Atlas, you could merge it again.
MERGE (m:Movie {title: 'Cloud Atlas'})
ON CREATE SET m.released = 2012
RETURN m
Created Nodes: 1 Added Labels: 1 Set Properties: 2 Rows: 1 +-------------------------------------------------+ | m | +-------------------------------------------------+ | (:Movie {title: 'Cloud Atlas', released: 2012}) | +-------------------------------------------------+
You get a result in both cases: either the data (potentially more than one row) that was already in the graph or a single, newly created Movie
node.
A |
So foremost MERGE
makes sure that you can’t create duplicate information or structures, but it comes with the cost of needing to check for existing matches first.
Especially on large graphs, it can be costly to scan a large set of labeled nodes for a specific property.
You can alleviate some of that by creating supporting indexes or constraints, which are discussed in the upcoming sections.
But it’s still not for free, so whenever you’re sure to not create duplicate data use CREATE
over MERGE
.
|
MATCH (m:Movie {title: 'Cloud Atlas'})
MATCH (p:Person {name: 'Tom Hanks'})
MERGE (p)-[r:ACTED_IN]->(m)
ON CREATE SET r.roles =['Zachry']
RETURN p, r, m
If the direction of a relationship is arbitrary, you can leave off the arrowhead.
MERGE
checks for the relationship in either direction and creates a new directed relationship if there is no matching relationship.
If you choose to pass in only one node from a preceding clause, MERGE
offers an interesting functionality.
It only matches within the direct neighborhood of the provided node for the given pattern, and if the pattern is not found creates it.
This can come in very handy for creating, for example, tree structures.
CREATE (y:Year {year: 2014})
MERGE (y)<-[:IN_YEAR]-(m10:Month {month: 10})
MERGE (y)<-[:IN_YEAR]-(m11:Month {month: 11})
RETURN y, m10, m11
This is the graph structure that gets created:
Here is no global search for the two Month
nodes; they are only searched for in the context of the 2014 Year
node.
Code challenge
Now knowing the basics, use the parts below to build a Cypher statement to find the title
and year of release
for every :Movie
that Tom Hanks has :DIRECTED
.
Click the parts to add them in order and once you are done, click Run Query to see whether you have got it right.
You can click any part of the query inside the code block to remove it.
MATCH (p:Person {name: "Tom Hanks"})-[:DIRECTED]->(m:Movie) RETURN m.title, m.released