Many existing applications and data integrations use CSV as the minimal denominator format.
In Cypher it is supported by LOAD CSV
and with the neo4j-import
(neo4j-admin import
) for bulk imports.
Usually a CSV file is text with delimiters (most often comma, but also tab (TSV) and colon (DSV)) separating columns and newlines for rows. Fields are possibly quoted to handle stray quotes, newlines, and the use of the delimeter within a field.
The existing LOAD CSV
works ok for most uses, but there were a few features missing, that apoc.load.csv
and apoc.load.xls
add.
The apoc procedures also support reading compressed files.
The data conversion is useful for setting properties directly, but for computation within Cypher it’s problematic as Cypher
doesn’t know the type of map values so they default to Any
.
To use them correctly, you’ll have to indicate their type to Cypher by using the built-in (e.g. toInteger
) or apoc (e.g. apoc.convert.toBoolean
) conversion functions on the value.
For reading from files you’ll have to enable the config option:
apoc.import.file.enabled=true
By default file paths are global, for paths relative to the import
directory set:
apoc.import.file.use_neo4j_config=true
test.csv.
name,age,beverage
Selma,9,Soda
Rana,12,Tea;Milk
Selina,19,Cola
CALL apoc.load.csv('/tmp/test.csv') yield lineNo, map, list
RETURN *;
+---------------------------------------------------------------------------------------+
| lineNo | list | map |
+---------------------------------------------------------------------------------------+
| 0 | ["Selma", "9", "Soda"] | {name: "Selma", age: "9", beverage: "Soda"} |
| 1 | ["Rana", "12", "Tea;Milk"] | {name: "Rana", age: "12", beverage: "Tea;Milk"} |
| 2 | ["Selina", "19", "Cola"] | {name: "Selina", age: "19", beverage: "Cola"} |
+---------------------------------------------------------------------------------------+
Besides the file you can pass in a config map:
name | default | description |
---|---|---|
|
|
skip result rows |
|
|
limit result rows |
|
|
indicates if file has a header |
|
|
separator character or 'TAB' |
|
|
the char to use for quoted elements |
|
|
array separator |
|
|
which columns to ignore |
|
|
which values to treat as null, e.g. |
|
|
per field mapping, entry key is field name, .e.g |
name | default | description |
---|---|---|
|
|
'int', 'string' etc. |
|
|
indicates if field is an array |
|
|
separator for array |
|
|
rename field |
|
|
ignore/remove this field |
|
|
which values to treat as null, e.g. |
CALL apoc.load.csv('/tmp/test.csv',
{skip:1,limit:1,header:true,ignore:'name',
mapping:{age:{type:'int'},beverage:{array:true,arraySep:';',name:'drinks'}) yield lineNo, map, list
RETURN *;
+---------------------------------------------------------------------------------------+
| lineNo | list | map |
+---------------------------------------------------------------------------------------+
| 0 | ["Selma", "9", "Soda"] | {name: "Selma", age: "9", beverage: "Soda"} |
| 1 | ["Rana", "12", "Tea;Milk"] | {name: "Rana", age: "12", beverage: "Tea;Milk"} |
| 2 | ["Selina", "19", "Cola"] | {name: "Selina", age: "19", beverage: "Cola"} |
+---------------------------------------------------------------------------------------+
To handle large files, USING PERIODIC COMMIT
can be prepended to LOAD CSV
, you’ll have to watch out though for Eager operations which might break that behavior.
In apoc you can combine any data source with apoc.periodic.iterate
to achieve the same.
CALL apoc.periodic.iterate('
CALL apoc.load.csv({url}) yield map as row return row
','
CREATE (p:Person) SET p = row
', {batchSize:10000, iterateList:true, parallel:true});
Please note that the parallel operation only works well for non-conflicting updates otherwise you might run into deadlocks. |
To make these datastructures available to Cypher, you can use apoc.load.xml
.
It takes a file or http URL and parses the XML into a map datastructure.
in previous releases we’ve had |
See the following usage-examples for the procedures.