7.3. Load CSV

Many existing applications and data integrations use CSV as the minimal denominator format.

In Cypher it is supported by LOAD CSV and with the neo4j-import (neo4j-admin import) for bulk imports.

Usually a CSV file is text with delimiters (most often comma, but also tab (TSV) and colon (DSV)) separating columns and newlines for rows. Fields are possibly quoted to handle stray quotes, newlines, and the use of the delimeter within a field.

The existing LOAD CSV works ok for most uses, but there were a few features missing, that apoc.load.csv and apoc.load.xls add.

provide a line number
provide both a map and a list representation of each line
automatic data conversion (including split into arrays)
option to keep the original string formatted values
ignoring fields (makes it easier to assign a full line as properties)
headerless files
replacing certain values with null

The apoc procedures also support reading compressed files.

The data conversion is useful for setting properties directly, but for computation within Cypher it’s problematic as Cypher doesn’t know the type of map values so they default to Any.

To use them correctly, you’ll have to indicate their type to Cypher by using the built-in (e.g. toInteger) or apoc (e.g. apoc.convert.toBoolean) conversion functions on the value.

For reading from files you’ll have to enable the config option:

apoc.import.file.enabled=true

By default file paths are global, for paths relative to the import directory set:

apoc.import.file.use_neo4j_config=true

7.3.1. Examples for apoc.load.csv

test.csv.

name,age,beverage
Selma,9,Soda
Rana,12,Tea;Milk
Selina,19,Cola

CALL apoc.load.csv('/tmp/test.csv') yield lineNo, map, list
RETURN *;

+---------------------------------------------------------------------------------------+
| lineNo | list                       | map                                             |
+---------------------------------------------------------------------------------------+
| 0      | ["Selma", "9", "Soda"]     | {name: "Selma", age: "9", beverage: "Soda"}     |
| 1      | ["Rana", "12", "Tea;Milk"] | {name: "Rana", age: "12", beverage: "Tea;Milk"} |
| 2      | ["Selina", "19", "Cola"]   | {name: "Selina", age: "19", beverage: "Cola"}   |
+---------------------------------------------------------------------------------------+

7.3.2. Configuration Options

Besides the file you can pass in a config map:

name	default	description
`skip`	`none`	skip result rows
`limit`	`none`	limit result rows
`header`	`true`	indicates if file has a header
`sep`	`','`	separator character or 'TAB'
`quoteChar`	`'"'`	the char to use for quoted elements
`arraySep`	`';'`	array separator
`ignore`	`[]`	which columns to ignore
`nullValues`	`[]`	which values to treat as null, e.g. `['na',false]`
`mapping`	`{}`	per field mapping, entry key is field name, .e.g `{years:{….}` see below

Table 7.2. mapping config for each field in the `mapping` entry
name	default	description
`type`	`none`	'int', 'string' etc.
`array`	`false`	indicates if field is an array
`arraySep`	`';'`	separator for array
`name`	`none`	rename field
`ignore`	`false`	ignore/remove this field
`nullValues`	`[]`	which values to treat as null, e.g. `['na',false]`

CALL apoc.load.csv('/tmp/test.csv',
  {skip:1,limit:1,header:true,ignore:'name',
   mapping:{age:{type:'int'},beverage:{array:true,arraySep:';',name:'drinks'}) yield lineNo, map, list
RETURN *;

+---------------------------------------------------------------------------------------+
| lineNo | list                       | map                                             |
+---------------------------------------------------------------------------------------+
| 0      | ["Selma", "9", "Soda"]     | {name: "Selma", age: "9", beverage: "Soda"}     |
| 1      | ["Rana", "12", "Tea;Milk"] | {name: "Rana", age: "12", beverage: "Tea;Milk"} |
| 2      | ["Selina", "19", "Cola"]   | {name: "Selina", age: "19", beverage: "Cola"}   |
+---------------------------------------------------------------------------------------+

7.3.3. Transaction Batching

To handle large files, USING PERIODIC COMMIT can be prepended to LOAD CSV, you’ll have to watch out though for Eager operations which might break that behavior.

In apoc you can combine any data source with apoc.periodic.iterate to achieve the same.

CALL apoc.periodic.iterate('
CALL apoc.load.csv({url}) yield map as row return row
','
CREATE (p:Person) SET p = row
', {batchSize:10000, iterateList:true, parallel:true});

Please note that the parallel operation only works well for non-conflicting updates otherwise you might run into deadlocks.

To make these datastructures available to Cypher, you can use apoc.load.xml. It takes a file or http URL and parses the XML into a map datastructure.

in previous releases we’ve had apoc.load.xmlSimple. This is now deprecated and got superseeded by apoc.load.xml(url, [xPath], [config], true).Simple XML Format

See the following usage-examples for the procedures.