Importing CSV Files: Neo4j Aura, Desktop and Sandbox
Loading various kinds of files into Neo4j requires different locations depending on the tool you are using.
Import methods we will cover:
-
Remote: Neo4j Aura and Neo4j Sandbox
-
Local: Neo4j Server and Neo4j Desktop
Neo4j Aura and Neo4j Sandbox
Cloud hosted versions of Neo4j can only access remote http(s) URLs. Because they are hosted in the cloud, the security settings do not allow sandboxes to access local file settings on a desktop. Any files that need to be imported must be stored or placed in a remote location that the instance can access
-
GitHub
-
Pastebin
-
Cloud Provider Storage
-
a website
-
Google Drive
-
Dropbox.
We will look at examples for some of the locations for importing into a Cloud hosted Neo4j Instance.
GitHub
If you find or place a file in a GitHub repository or GitHub Gist, others can access the content in a raw format. You just need to navigate to the place that contains the file and go to the file. Once there, you should see a menu bar like the one below right above the file contents.
Click on the Raw
button in the button list on the right and copy the url path when the page loads (url should start like https://raw.githubusercontent.com/…;
).
Now you should be able to use the data in your Neo4j Browser session with a statement like this one.
LOAD CSV FROM 'https://raw.githubusercontent.com/<yourFileRepositoryPath>' AS row
RETURN row
LIMIT 20
Website
If the file is hosted on a website, Neo4j can access it there with a public URL.
For example, in the Neo4j Cypher Manual, the LOAD CSV
page uses a csv file on neo4j.com.
LOAD CSV FROM 'http://data.neo4j.com/northwind/products.csv' AS row
RETURN row
The same is true for any cloud providers storage
-
AWS S3
-
GCP Buckets
-
Azure Blob storage
You can upload the files there using your credentials and the UI or CLI and make them (temporarily) publicly accessible and then use the HTTPS URL of the file.
Google Sheets
You can access files uploaded to Google Sheets, if they are published to the web. Once the file is imported into a tab of a Google Sheet, you can follow the screenshots below to walk through the rest of the process.
The red boxes and numbering show where to click and what order to do the steps.
LOAD CSV WITH HEADERS FROM 'https://docs.google.com/spreadsheets/d/e/2PACX-1vSx5-mHPUs7hQ3292zrLL_FeNzo85iC83TiezRcPl_SUv4NpW0e2VZilCUH9KbCWExAfE7OAELgdCW8/pub?gid=0&single=true&output=csv' AS row
RETURN row
Dropbox
A file uploaded to Dropbox works similarly to the process with Google Drive. Again, you will need to ensure permissions are set appropriately. Screenshots below step through the rest of the steps.
LOAD CSV WITH HEADERS FROM 'https://<fileId>.dl.dropboxusercontent.com/cd/0/get/<yourFilePath>/file#' AS row
RETURN row
Local Neo4j Install and Neo4j Desktop
Your local Neo4j installations can of course also access the remote files via URL as described above.
If you want to import files from the local disk for privacy or performance reasons you have to place them into the import
folder.
It is generally located relative to your Neo4j server installation.
Those files can then be accessed via file:///filename.csv
URLs, e.g.
LOAD CSV WITH HEADERS FROM 'file:///products.csv' AS row
RETURN row
Neo4j Desktop
In Neo4j Desktop you can open the folder in your file-manager (explorer, finder, etc) via the UI by clicking on the "Open (Folder)" dropdown or menu.
Then place the files there and access them directly from Neo4j.
The CSV import developer guide walks through loading local CSV files to Neo4j Desktop.
Custom Import Folder
If you require a file location different from the default, you can update the following setting in the neo4j.conf
file.
We recommend specifying a directory path, rather than commenting out the setting, to avoid the security issue mentioned in the configuration comment.
# This setting constrains all `LOAD CSV` import files to be under the `import` directory. Remove or comment it out to
# allow files to be loaded from anywhere in the filesystem; this introduces possible security problems. See the
# `LOAD CSV` section of the manual for details.
dbms.directories.import=import
Resources
You can find the full list of file locations by operating system (does not include Sandbox) in the operations manual.
Andy Jefferson explores different methods of loading files (securely) into a remote Neo4j instance * Part 1 - ngrok and python webserver * Part 2 - Cloud Storage
Is this page helpful?