LOAD CSV
LOAD CSV
is used to import data from CSV files into a Neo4j database.
LOAD CSV FROM 'https://data.neo4j.com/bands/artists.csv' (1)
AS row (2)
MERGE (:Artist {name: row[1], year: toInteger(row[2])}) (3)
1 | FROM takes a STRING containing the path where the CSV file is located. |
2 | The clause parses one row at a time, temporarily storing the current row in the variable specified with AS . |
3 | The MERGE clause accesses the row variable to insert data into the database. |
LOAD CSV
supports both local and remote URLs.
Local paths are resolved relative to the Neo4j installation folder.
Loading CSV files requires load privileges. |
Import CSV data into Neo4j
Import local files
You can store CSV files on the database server and then access them by using a file:///
URL.
By default, paths are resolved relative to the Neo4j import directory.
1,ABBA,1992
2,Roxette,1986
3,Europe,1979
4,The Cardigans,1992
LOAD CSV FROM 'file:///artists.csv' AS row
MERGE (a:Artist {name: row[1], year: toInteger(row[2])})
RETURN a.name, a.year
a.name | a.year |
---|---|
|
|
|
|
|
|
|
|
4 rows Added 4 nodes, Set 8 properties, Added 4 labels |
For ways of importing data into an Aura instance, see Aura → Importing data. |
When using file:/// URLs, spaces and other non-alphanumeric characters must be URL-encoded.
|
Configuration settings for file URLs
- dbms.security.allow_csv_import_from_file_urls
-
This setting determines whether
file:///
URLs are allowed. - server.directories.import
-
This setting sets the root directory relative to which
file:///
URLs are parsed.
Import from a remote location
You can import data from a CSV file hosted on a remote path.
LOAD CSV
supports accessing CSV files via HTTPS, HTTP, and FTP (with or without credentials).
It also follows redirects, except those changing the protocol (for security reasons).
It is strongly recommended to permit resource loading only over secure protocols such as HTTPS instead of insecure protocols like HTTP.
This can be done by limiting the load privileges to only trusted sources that use secure protocols.
If allowing an insecure protocol is absolutely unavoidable, Neo4j takes measures internally to enhance the security of these requests within their limitations.
However, this means that insecure URLs on virtual hosts will not function unless you add the JVM argument |
1,ABBA,1992
2,Roxette,1986
3,Europe,1979
4,The Cardigans,1992
LOAD CSV FROM 'https://data.neo4j.com/bands/artists.csv' AS row
MERGE (a:Artist {name: row[1], year: toInteger(row[2])})
RETURN a.name, a.year
a.name | a.year |
---|---|
|
|
|
|
|
|
|
|
4 rows Added 4 nodes, Set 8 properties, Added 4 labels |
1,ABBA,1992
2,Roxette,1986
3,Europe,1979
4,The Cardigans,1992
LOAD CSV FROM 'ftp://<username>:<password>@<domain>/bands/artists.csv' AS row
MERGE (a:Artist {name: row[1], year: toInteger(row[2])})
RETURN a.name, a.year
a.name | a.year |
---|---|
|
|
|
|
|
|
|
|
4 rows Added 4 nodes, Set 8 properties, Added 4 labels |
Import from cloud URIs
You can import data from a number of different cloud storages:
See Operations Manual → Load a dump from a cloud storage on how to set up access to cloud storages.
Import from an Azure Cloud Storage URI
You can import data from a CSV file hosted in an Azure Cloud Storage URI.
1,ABBA,1992
2,Roxette,1986
3,Europe,1979
4,The Cardigans,1992
LOAD CSV FROM 'azb://azb-account/azb-container/artists.csv' AS row
MERGE (a:Artist {name: row[1], year: toInteger(row[2])})
RETURN a.name, a.year
a.name | a.year |
---|---|
|
|
|
|
|
|
|
|
4 rows Added 4 nodes, Set 8 properties, Added 4 labels |
Import from a Google Cloud Storage URI
You can import data from a CSV file hosted in a Google Cloud Storage URI.
1,ABBA,1992
2,Roxette,1986
3,Europe,1979
4,The Cardigans,1992
LOAD CSV FROM 'gs://gs-bucket/artists.csv' AS row
MERGE (a:Artist {name: row[1], year: toInteger(row[2])})
RETURN a.name, a.year
a.name | a.year |
---|---|
|
|
|
|
|
|
|
|
4 rows Added 4 nodes, Set 8 properties, Added 4 labels |
Import from an AWS S3 URI
You can import data from a CSV file hosted in an AWS S3 URI.
1,ABBA,1992
2,Roxette,1986
3,Europe,1979
4,The Cardigans,1992
LOAD CSV FROM 's3://aws-bucket/artists.csv' AS row
MERGE (a:Artist {name: row[1], year: toInteger(row[2])})
RETURN a.name, a.year
a.name | a.year |
---|---|
|
|
|
|
|
|
|
|
4 rows Added 4 nodes, Set 8 properties, Added 4 labels |
Import CSV files using dynamic columns
CSV columns can be referenced dynamically to map labels to nodes in the graph. This enables flexible data handling, allowing labels to be be populated from CSV column values without manually specifying each entry. It also mitigates the risk of Cypher® injection. (For more information about Cypher injection, see Neo4j Knowledge Base → Protecting against Cypher injection).
Id,Label,Name
1,Band,The Beatles
2,Band,The Rolling Stones
3,Band,Pink Floyd
4,Band,Led Zeppelin
LOAD CSV WITH HEADERS FROM 'file:///bands-with-headers.csv' AS line
MERGE (n:$(line.Label) {name: line.Name})
RETURN n AS bandNodes
bandNodes |
---|
|
|
|
|
Rows: 4 |
MERGE queries using dynamic values may not be as performant as those using static values.
This is because the Cypher planner uses statically available information when planning queries to determine whether to use an index or not, and this is not possible when using dynamic values.
For more information, see MERGE using dynamic node labels and relationship types → Performance caveats.
|
Import compressed CSV files
LOAD CSV
can read local CSV files compressed with ZIP or gzip.
ZIP archives can have arbitrary directory structures but may only contain a single CSV file.
LOAD CSV FROM 'file:///artists.zip' AS row
MERGE (:Artist {name: row[1], year: toInteger(row[2])})
You can’t load zipped CSV files from remote URLs. |
Import data from relational databases
If the source data comes from a relational model, it’s worth evaluating how to gain the most from moving to a graph data model. Before running the import, think about how the data can be modeled as a graph, and adapt its structure accordingly when running the import (see Graph data modeling).
Data from relational databases may consist of one or multiple CSV files, depending on the source database structure.
A performant approach is to run multiple passes of LOAD CSV
to import nodes separately from relationships.
The source file books.csv contains information about both authors and books. From a graph perspective, these are nodes with different labels, so it takes different queries to load them.
The example executes multiple passes of LOAD CSV
on that one file, and each pass focuses on the creation of one entity type.
id,title,author,publication_year,genre,rating,still_in_print,last_purchased
19515,The Heights,Anne Conrad,2012,Comedy,5,true,2023/4/12 8:17:00
39913,Starship Ghost,Michael Tyler,1985,Science Fiction|Horror,4.2,false,2022/01/16 17:15:56
60980,The Death Proxy,Tim Brown,2002,Horror,2.1,true,2023/11/26 8:34:26
18793,Chocolate Timeline,Mary R. Robb,1924,Romance,3.5,false,2022/9/17 14:23:45
67162,Stories of Three,Eleanor Link,2022,Romance|Comedy,2,true,2023/03/12 16:01:23
25987,Route Down Below,Tim Brown,2006,Horror,4.1,true,2023/09/24 15:34:18
// Create `Book` nodes
LOAD CSV WITH HEADERS FROM 'https://data.neo4j.com/importing-cypher/books.csv' AS row
MERGE (b:Book {id: row.id, title: row.title})
MERGE (a:Author {name: row.author});
// Create `WROTE` relationships
LOAD CSV WITH HEADERS FROM 'https://data.neo4j.com/importing-cypher/books.csv' AS row
MATCH (a:Author{name: row.author})
MATCH (b:Book{id: row.id})
MERGE (a)-[:WROTE]->(b);
Added 11 nodes, Set 17 properties, Added 11 labels
Created 6 relationships
The file acted_in.csv contains data about the relationship between actors and the movies they acted in (from persons.csv and movies.csv).
Actors and movies are linked through their ID columns person_tmdbId
and movieId
.
The file also contains the role the actor played in the movie, and it is imported in Neo4j as a relationship property.
movieId,person_tmdbId,role
1,12899,Slinky Dog (voice)
1,12898,Buzz Lightyear (voice)
...
It takes three LOAD CSV
clauses to import this dataset: the first two create Person
nodes from persons.csv and Movie
nodes from movies.csv, and the third adds the :ACTED_IN
relationship from acted_in.csv.
// Create person nodes
LOAD CSV WITH HEADERS FROM 'https://data.neo4j.com/importing-cypher/persons.csv' AS row
MERGE (p:Person {name: row.name, tmdbId: row.person_tmdbId});
// Create movie nodes
LOAD CSV WITH HEADERS FROM 'https://data.neo4j.com/importing-cypher/movies.csv' AS row
MERGE (m:Movie {movieId: row.movieId, title: row.title});
// Create relationships
LOAD CSV WITH HEADERS FROM 'https://data.neo4j.com/importing-cypher/acted_in.csv' AS row
MATCH (p:Person {tmdbId: row.person_tmdbId})
MATCH (m:Movie {movieId: row.movieId})
MERGE (p)-[r:ACTED_IN {role: row.role}]->(m);
Added 444 nodes, Set 888 properties, Added 444 labels
Added 93 nodes, Set 186 properties, Added 93 labels
Created 372 relationships, Set 372 properties
For a guide on importing the Northwind dataset from Postgres into Neo4j, see Tutorial: Import data from a relational database into Neo4j in the Getting Started Guide. |
Create additional node labels
In Neo4j a node can have multiple labels, while in a relational setting it’s not as straightforward to mix entities.
For example, a node in Neo4j can be labeled both Dog
and Actor
, while in a relational model dogs and actors are separate entities.
After a relational dataset has been imported, there may be further labels that can be added, depending on the use case. Additional labels can speed up pinpointing a node if you use them in your queries.
Actor
label on Person
nodesThe :ACTED_IN
relationship from acted_in.csv implicitly defines actors as a subset of people.
The following queries adds an additional Actor
label to all people who have an outgoing :ACTED_IN
relationship.
MATCH (p:Person)-[:ACTED_IN]->()
WITH DISTINCT p
SET p:Actor
Added 353 labels
Pre-process the data during import
Cast CSV columns to Neo4j data types
LOAD CSV
inserts all imported CSV data as STRING
properties.
However, Neo4j supports a range of data types, and storing data with appropriate types allows both to query it more effectively and to process it with type-specific Cypher functions.
The column person_tmdbId
and born
in the file persons.csv contains INTEGER
and DATE
values respectively.
The functions toInteger()
and date()
allow to cast those values to the appropriate types before importing them.
person_tmdbId,bio,born,bornIn,died,person_imdbId,name,person_poster,person_url
3,"Legendary Hollywood Icon Harrison Ford was born on July 13, 1942 in Chicago, Illinois. His family history includes a strong lineage of actors, radio personalities, and models. Harrison attended public high school in Park Ridge, Illinois where he was a member of the school Radio Station WMTH. Harrison worked as the lead voice for sports reporting at WMTH for several years. Acting wasn’t a major interest to Ford until his junior year at Ripon College when he first took an acting class...",1942-07-13,"Chicago, Illinois, USA",,148,Harrison Ford,https://image.tmdb.org/t/p/w440_and_h660_face/5M7oN3sznp99hWYQ9sX0xheswWX.jpg,https://themoviedb.org/person/3
...
LOAD CSV WITH HEADERS FROM 'https://data.neo4j.com/importing-cypher/persons.csv' AS row
MERGE (p:Person {tmdbId: toInteger(row.person_tmdbId)})
SET p.name = row.name, p.born = date(row.born)
RETURN
p.name AS name,
p.tmdbId AS tmdbId,
p.born AS born
LIMIT 5
name | tmdbId | born |
---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 rows Added 444 nodes, Set 1332 properties, Added 444 labels |
For a list of type casting functions, see Casting data values.
Handle null
values
Neo4j does not store null
values.
null
or empty fields in a CSV files can be skipped or replaced with default values in LOAD CSV
.
null
valuesIn the file companies.csv
, some rows do not specify values for some columns.
The examples show several options of how to handle null
values.
Id,Name,Location,Email,BusinessType
1,Neo4j,San Mateo,contact@neo4j.com,P
2,AAA,,info@aaa.com,
3,BBB,Chicago, ,G
,CCC,Michigan,info@ccc.com,G
null
valuesLOAD CSV WITH HEADERS FROM 'file:///companies.csv' AS row
WITH row
WHERE row.Id IS NOT NULL
MERGE (c:Company {id: row.Id})
null
valuesLOAD CSV WITH HEADERS FROM 'file:///companies.csv' AS row
WITH row
WHERE row.Id IS NOT NULL
MERGE (c:Company {id: row.Id, hqLocation: coalesce(row.Location, "Unknown")})
STRING
values to null
values (not stored)LOAD CSV WITH HEADERS FROM 'file:///companies.csv' AS row
WITH row
WHERE row.Id IS NOT NULL
MERGE (c:Company {id: row.Id})
SET c.email = nullIf(trim(row.Email), "")
null values are not stored in the database.
A strategy for selectively getting rid of some values is to map them into null values.
The empty STRING values from the last query serve as an example.
|
Split list values
The function split()
allows to convert a STRING
of elements into a list.
The file movies.csv contains a header line and a total of 94 lines.
The columns languages
and genres
contain list-like values.
Both are separated by a pipe |
, and split()
allows to make them into Cypher lists ahead of inserting them into the database.
movieId,title,budget,countries,movie_imdbId,imdbRating,imdbVotes,languages,plot,movie_poster,released,revenue,runtime,movie_tmdbId,movie_url,year,genres
1,Toy Story,30000000.0,USA,114709,8.3,591836,English,A cowboy doll is profoundly threatened and jealous when a new spaceman figure supplants him as top toy in a boy's room.,https://image.tmdb.org/t/p/w440_and_h660_face/uXDfjJbdP4ijW5hWSBrPrlKpxab.jpg,1995-11-22,373554033.0,81,862,https://themoviedb.org/movie/862,1995,Adventure|Animation|Children|Comedy|Fantasy
2,Jumanji,65000000.0,USA,113497,6.9,198355,English|French,"When two kids find and play a magical board game, they release a man trapped for decades in it and a host of dangers that can only be stopped by finishing the game.",https://image.tmdb.org/t/p/w440_and_h660_face/vgpXmVaVyUL7GGiDeiK1mKEKzcX.jpg,1995-12-15,262797249.0,104,8844,https://themoviedb.org/movie/8844,1995,Adventure|Children|Fantasy
...
LOAD CSV WITH HEADERS FROM 'https://data.neo4j.com/importing-cypher/movies.csv' AS row
MERGE (m:Movie {id: toInteger(row.movieId)})
SET
m.title = row.title,
m.imdbId = toInteger(row.movie_imdbId),
m.languages = split(row.languages, '|'),
m.genres = split(row.genres, '|')
RETURN
m.title AS title,
m.imdbId AS imdbId,
m.languages AS languages,
m.genres AS genres
LIMIT 5
title | imdbId | languages | genres |
---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 rows Added 93 nodes, Set 465 properties, Added 93 labels |
For more STRING
manipulation functions, see String functions.
Recommendations
Create property uniqueness constraints
Always create property uniqueness constraints prior to importing data, to avoid duplicates or colliding entities. If the source file contains duplicated data and the right constraints are in place, Cypher raises an error.
person_tmdbId,bio,born,bornIn,died,person_imdbId,name,person_poster,person_url
3,"Legendary Hollywood Icon Harrison Ford was born on July 13, 1942 in Chicago, Illinois. His family history includes a strong lineage of actors, radio personalities, and models. Harrison attended public high school in Park Ridge, Illinois where he was a member of the school Radio Station WMTH. Harrison worked as the lead voice for sports reporting at WMTH for several years. Acting wasn’t a major interest to Ford until his junior year at Ripon College when he first took an acting class...",1942-07-13,"Chicago, Illinois, USA",,148,Harrison Ford,https://image.tmdb.org/t/p/w440_and_h660_face/5M7oN3sznp99hWYQ9sX0xheswWX.jpg,https://themoviedb.org/person/3
...
CREATE CONSTRAINT Person_tmdbId IF NOT EXISTS
FOR (p:Person) REQUIRE p.tmdbId IS UNIQUE
Added 1 constraints
Handle large amounts of data
LOAD CSV
may run into memory issues with files containing a significant number of rows (approaching hundreds of thousands or millions). For large files, it’s recommended to split the import process in several lighter transactions through the clause CALL {…} IN TRANSACTIONS
.
The file persons.csv contains a header line and a total of 869 lines.
The example loads the name
and born
columns in transactions of 200 rows.
person_tmdbId,bio,born,bornIn,died,person_imdbId,name,person_poster,person_url
3,"Legendary Hollywood Icon Harrison Ford was born on July 13, 1942 in Chicago, Illinois. His family history includes a strong lineage of actors, radio personalities, and models. Harrison attended public high school in Park Ridge, Illinois where he was a member of the school Radio Station WMTH. Harrison worked as the lead voice for sports reporting at WMTH for several years. Acting wasn’t a major interest to Ford until his junior year at Ripon College when he first took an acting class...",1942-07-13,"Chicago, Illinois, USA",,148,Harrison Ford,https://image.tmdb.org/t/p/w440_and_h660_face/5M7oN3sznp99hWYQ9sX0xheswWX.jpg,https://themoviedb.org/person/3
...
The query below uses a variable scope clause (introduced in Neo4j 5.23) to import variables into the CALL subquery.
If you are using an older version of Neo4j, use an importing WITH clause instead.
|
LOAD CSV WITH HEADERS FROM 'https://data.neo4j.com/importing-cypher/persons.csv' AS row
CALL (row) {
MERGE (p:Person {tmdbId: row.person_tmdbId})
SET p.name = row.name, p.born = row.born
} IN TRANSACTIONS OF 200 ROWS
Added 444 nodes, Set 1332 properties, Added 444 labels
In case of errors, CALL {…} IN TRANSACTIONS may only import a part of the CSV data as the transactions are committed.
For example, if the first 200 rows are error free, they are committed.
If the next 200 rows contain data that causes an error, the second transaction fails, but leaves the first transaction unaffected.
|
LOAD CSV and Neo4j functions
Access line numbers with linenumber()
The linenumber()
function provides the line number which LOAD CSV
is operating on, or null
if called outside of a LOAD CSV
context.
A common use case for this function is to generate sequential unique IDs for CSV data that doesn’t have a unique column already.
linenumber()
1,ABBA,1992
2,Roxette,1986
3,Europe,1979
4,The Cardigans,1992
LOAD CSV FROM 'file:///artists.csv' AS row
RETURN linenumber() AS number, row
number | row |
---|---|
|
|
|
|
|
|
|
|
4 rows |
Access the CSV file path with file()
The file()
function provides the absolute path of the file that LOAD CSV
is operating on, or null
if called out of a LOAD CSV
context.
file()
1,ABBA,1992
2,Roxette,1986
3,Europe,1979
4,The Cardigans,1992
LOAD CSV FROM 'file:///artists.csv' AS row
RETURN DISTINCT file() AS path
path |
---|
|
file() always returns a local path, even when loading remote CSV files.
For remote resources, file() returns the temporary local path it was downloaded to.
|
CSV file format
The CSV file format and LOAD CSV
interact as follows:
-
The file character encoding must be UTF-8.
-
The line terminator is system dependent (
\n
for Unix and\r\n
for Windows). -
The default field delimiter is
,
. Change it with the optionFIELDTERMINATOR
. -
CSV files may contain quoted
STRING
values, and the quotes are dropped whenLOAD CSV
reads the data. -
If
dbms.import.csv.legacy_quote_escaping
is set to the default value oftrue
,\
is used as an escape character. -
A double quote must be in a quoted
STRING
and escaped, with either the escape character or a second double quote.
Headers
If the CSV file starts with a header row containing column names, each import row in the file acts as a map instead of an array.
You must indicate the presence of the header row by adding WITH HEADERS
to the query.
You can then access specific fields by their corresponding column name.
Id,Name,Year
1,ABBA,1992
2,Roxette,1986
3,Europe,1979
4,The Cardigans,1992
LOAD CSV WITH HEADERS FROM 'file:///artists-with-headers.csv' AS row
MERGE (a:Artist {name: row.Name, year: toInteger(row.Year)})
RETURN
a.name AS name,
a.year AS year
name | year |
---|---|
|
|
|
|
|
|
|
|
4 rows Added 4 nodes, Set 8 properties, Added 4 labels |
Field delimiter
The default field delimiter is ,
.
Use the FIELDTERMINATOR
option to specify a different field delimiter.
If you try to import a file that doesn’t use ,
as field delimiter and you also don’t specify a custom delimiter, LOAD CSV
will interpret the CSV as having a single column.
;
as field delimiter1;ABBA;1992
2;Roxette;1986
3;Europe;1979
4;The Cardigans;1992
LOAD CSV FROM 'file:///artists-fieldterminator.csv' AS row FIELDTERMINATOR ';'
MERGE (:Artist {name: row[1], year: toInteger(row[2])})
Added 4 nodes, Set 8 properties, Added 4 labels
You can use the hexadecimal representation of the unicode character for the field delimiter if you prepend \u .
Write the encoding with four digits: for example, \u003B is equivalent to ; (semicolon).
|
Quotes escaping
Quoted STRING
values are allowed in the CSV file and the quotes are dropped when LOAD CSV
reads the data.
If quoted STRING
values must contain quote characters "
, there are two ways to escape them:
-
Double quotes — Use another quote
"
to escape a quote (for example, the CSV encoding of theSTRING
The "Symbol"
is"The ""Symbol"""
). -
Prefix with backslash
\
— If the configuration settingdbms.import.csv.legacy_quote_escaping
is set totrue
(the default value),\
works as the escape character for quotes (for example, the CSV encoding of theSTRING
The "Symbol"
is"The \"Symbol\""
).
"1","The ""Symbol""","1992"
"2","The \"Symbol\"","1992"
LOAD CSV FROM 'file:///artists-with-escaped-quotes.csv' AS row
MERGE (a:Artist {id: toInteger(row[0]), name: row[1], year: toInteger(row[2])})
RETURN
a.id AS id,
a.name AS name,
a.year AS year,
size(a.name) AS size
id | name | year | size |
---|---|---|---|
|
|
|
|
|
|
|
|
Added 2 nodes, Set 6 properties, Added 2 labels |
Note that name
is a STRING
, as it is wrapped in quotes in the output.
The third column outputs the STRING
length as size
.
The length only counts what is between the outer quotes, but not the quotes themselves.
Check source data quality
In case of a failed import, there are some elements to check to ensure the source file is not corrupted.
-
Inconsistent headers — The CSV header may be inconsistent with the data. It can be missing, have too many columns or have a different delimiter. Verify that the header matches the data in the file. Adjust the formatting, delimiters or columns.
-
Extra or missing quotes — Standalone double or single quotes in the middle of non-quoted text or non-escaped quotes in quoted text can cause issues reading the file. Either escape or remove stray quotes. See Quotes escaping.
-
Special or newline characters — When dealing with special characters in a file, ensure they are quoted or remove them.
-
Inconsistent line breaks — Ensure line breaks are consistent throughout your file.
-
Binary zeros, BOM byte order mark and other non-text characters — Unusual characters or tool-specific formatting are sometimes hidden in application tools, but become apparent in plain-text editors. If you come across these types of characters in your file, either remove them or use Cypher’s normalize function.
Inspect source files ahead of import
Before importing data into the database, you can use LOAD CSV
to inspect a source file and get an idea of what form the imported data is going to have.
// Assert correct line count
LOAD CSV FROM 'https://data.neo4j.com/importing-cypher/persons.csv' AS line
RETURN count(*);
count(*) |
---|
|
1 row |
// Check first 5 line-sample with header-mapping
LOAD CSV WITH HEADERS FROM 'https://data.neo4j.com/importing-cypher/persons.csv' AS line
RETURN line.person_tmdbId, line.name
LIMIT 5;
line.person_tmdbId | line.name |
---|---|
|
|
|
|
|
|
|
|
|
|
5 rows |
Example
// Clear data
MATCH (n) DETACH DELETE n;
// Create constraints
CREATE CONSTRAINT Person_tmdbId IF NOT EXISTS
FOR (p:Person) REQUIRE p.tmdbId IS UNIQUE;
CREATE CONSTRAINT Movie_movieId IF NOT EXISTS
FOR (m:Movie) REQUIRE m.movieId IS UNIQUE;
// Create person nodes
LOAD CSV WITH HEADERS FROM 'https://data.neo4j.com/importing-cypher/persons.csv' AS row
MERGE (p:Person {tmdbId: toInteger(row.person_tmdbId)})
SET p.name = row.name, p.born = date(row.born);
// Create movie nodes
LOAD CSV WITH HEADERS FROM 'https://data.neo4j.com/importing-cypher/movies.csv' AS row
MERGE (m:Movie {id: toInteger(row.movieId)})
SET
m.title = row.title,
m.imdbId = toInteger(row.movie_imdbId),
m.languages = split(row.languages, '|'),
m.genres = split(row.genres, '|');
// Create relationships
LOAD CSV WITH HEADERS FROM 'https://data.neo4j.com/importing-cypher/acted_in.csv' AS row
MATCH (p:Person {tmdbId: toInteger(row.person_tmdbId)})
MATCH (m:Movie {id: toInteger(row.movieId)})
MERGE (p)-[r:ACTED_IN]->(m)
SET r.role = row.role;
// Set additional node label
MATCH (p:Person)-[:ACTED_IN]->()
WITH DISTINCT p
SET p:Actor;
Added 1 constraints
Added 1 constraints
Added 444 nodes, Set 1332 properties, Added 444 labels
Added 93 nodes, Set 465 properties, Added 93 labels
Created 372 relationships, Set 372 properties
Added 353 labels
With increasing amounts of data, it is more efficient to create all nodes first, and then add relationships with a second pass. |
Other ways of importing data
There are a few other tools to get CSV data into Neo4j.
-
The
neo4j-admin database import
command is the most efficient way of importing large CSV files. -
Use a language library to parse CSV data and run creation Cypher queries against a Neo4j database. Created as an extension library to provide common procedures and functions to developers. This library is especially helpful for complex transformations and data manipulations. Useful procedures include apoc.load.jdbc, apoc.load.json, and others.
-
The ETL Tool: allows to extract the schema from a relational database and turn it into a graph model. It then takes care of importing the data into Neo4j.
-
The Kettle import tool maps and executes steps for the data process flow and works well for very large datasets, especially if you are already familiar with using this tool.