Chapter 7. Export / Import

In case you have the default configuration with apoc.import.file.use_neo4j_config=true the export consider as root the directory defined into the dbms.directories.import property

7.1. Loading Data from Web-APIs

Supported protocols are file, http, https, s3, hdfs with redirect allowed.

In case no protocol is passed, this procedure set will try to check whether the url is actually a file.

As apoc.import.file.use_neo4j_config is enabled, the procedures check whether file system access is allowed and possibly constrained to a specific directory by reading the two configuration parameters dbms.security.allow_csv_import_from_file_urls and dbms.directories.import respectively. If you want to remove these constraints please set apoc.import.file.use_neo4j_config=false

CALL apoc.load.json('http://example.com/map.json', [path], [config]) YIELD value as person CREATE (p:Person) SET p = person

load from JSON URL (e.g. web-api) to import JSON as stream of values if the JSON was an array or a single value if it was a map

CALL apoc.load.xml('http://example.com/test.xml', ['xPath'], [config]) YIELD value as doc CREATE (p:Person) SET p.name = doc.name

load from XML URL (e.g. web-api) to import XML as single nested map with attributes and _type, _text and _children fields.

CALL apoc.load.xmlSimple('http://example.com/test.xml') YIELD value as doc CREATE (p:Person) SET p.name = doc.name

load from XML URL (e.g. web-api) to import XML as single nested map with attributes and _type, _text fields and _<childtype> collections per child-element-type.

CALL apoc.load.csv('url',{sep:";"}) YIELD lineNo, list, strings, map, stringMap

load CSV fom URL as stream of values config contains any of: {skip:1,limit:5,header:false,sep:'TAB',ignore:['aColumn'],arraySep:';',results:['map','list','strings','stringMap'], nullValues:[''],mapping:{years:{type:'int',arraySep:'-',array:false,name:'age',ignore:false,nullValues:['n.A.']}}

CALL apoc.load.xls('url','Sheet'/'Sheet!A2:B5',{config}) YIELD lineNo, list, map

load XLS fom URL as stream of values config contains any of: {skip:1,limit:5,header:false,ignore:['aColumn'],arraySep:';'+ nullValues:[''],mapping:{years:{type:'int',arraySep:'-',array:false,name:'age',ignore:false,nullValues:['n.A.']}}

7.7. Export to CSV

YIELD file, source, format, nodes, relationships, properties, time, rows, data

apoc.export.csv.query(query,file,config)

exports results from the Cypher statement as CSV to the provided file

apoc.export.csv.all(file,config)

exports whole database as CSV to the provided file

apoc.export.csv.data(nodes,rels,file,config)

exports given nodes and relationships as CSV to the provided file

apoc.export.csv.graph(graph,file,config)

exports given graph object as CSV to the provided file

If the file name is passed as null and the config stream:true the results are streamed back in the data column, e.g.

7.7.1. Note:

For apoc.export.csv.all/data/graph nodes and relationships properties are ordered alphabetically, following this general structure:

_id,_labels,<list_nodes_properties_naturally_sorted>,_start,_end,_type,<list_rel_properties_naturally_sorted>, so for instance:

_id,_labels,age,city,kids,male,name,street,_start,_end,_type,bar,foo

The labels exported are ordered alphabetically. The output of labels() function is not sorted, use it in combination with apoc.coll.sort().

CALL apoc.export.csv.all(null, {stream:true,batchSize:100}) YIELD data RETURN data

7.8. Export to Json File

Make sure to set the config options in your neo4j.conf

Neo4j.conf. 

apoc.export.file.enabled=true

YIELD file, source, format, nodes, relationships, properties, time, rows

apoc.export.json.query(query,file,config)

exports results from the Cypher statement as Json to the provided file

apoc.export.json.all(file,config)

exports whole database as Json to the provided file

apoc.export.json.data(nodes,rels,file,config)

exports given nodes and relationships as Json to the provided file

apoc.export.json.graph(graph,file,config)

exports given graph object as Json to the provided file

Table 7.3. Config

writeNodeProperties

true/false, if true export properties too.

The labels exported are ordered alphabetically. The output of labels() function is not sorted, use it in combination with apoc.coll.sort().

7.8.1. Examples

For the examples below we create the following dataset:

CREATE (f:User {name:'Adam',age:42,male:true,kids:['Sam','Anna','Grace'], born:localdatetime('2015185T19:32:24'), place:point({latitude: 13.1, longitude: 33.46789})})-[:KNOWS {since: 1993}]->(b:User {name:'Jim',age:42}),(c:User {age:12})

Export query. 

CALL apoc.export.json.query("MATCH (u:User) WHERE age > {age} return u","/tmp/query.json",{params:{age:10}})

result:

{"u":{"type":"node","id":"0","labels":["User"],"properties":{"born":"2015-07-04T19:32:24","name":"Adam","place":{"crs":"wgs-84","latitude":33.46789,"longitude":13.1,"height":null},"age":42,"male":true,"kids":["Sam","Anna","Grace"]}}}
{"u":{"type":"node","id":"1","labels":["User"],"properties":{"name":"Jim","age":42}}}
{"u":{"type":"node","id":"2","labels":["User"],"properties":{"age":12}}}

Export query Complex. 

CALL apoc.export.json.query("RETURN {value:1, data:[10,'car',null, point({ longitude: 56.7, latitude: 12.78 }), point({ longitude: 56.7, latitude: 12.78, height: 8 }), point({ x: 2.3, y: 4.5 }), point({ x: 2.3, y: 4.5, z: 2 }),date('2018-10-10'), datetime('2018-10-18T14:21:40.004Z'), localdatetime({ year:1984, week:10, dayOfWeek:3, hour:12, minute:31, second:14, millisecond: 645 }), {x:1, y:[1,2,3,{age:10}]}]} as key","/tmp/complex.json")

result:

{"key":{"data":[10,"car",null,{"crs":"wgs-84","latitude":56.7,"longitude":12.78,"height":null},{"crs":"wgs-84-3d","latitude":56.7,"longitude":12.78,"height":8.0},{"crs":"cartesian","x":2.3,"y":4.5,"z":null},{"crs":"cartesian-3d","x":2.3,"y":4.5,"z":2.0},"2018-10-10","2018-10-18T14:21:40.004Z","1984-03-07T12:31:14.645",{"x":1,"y":[1,2,3,{"age":10}]}],"value":1}}

Export queryList. 

CALL apoc.export.json.query("MATCH (u:User) RETURN COLLECT(u) as list","/tmp/list.json",{params:{age:10}})

result:

{"list":[{"type":"node","id":"0","labels":["User"],"properties":{"born":"2015-07-04T19:32:24","name":"Adam","place":{"crs":"wgs-84","latitude":33.46789,"longitude":13.1,"height":null},"age":42,"male":true,"kids":["Sam","Anna","Grace"]}},{"type":"node","id":"1","labels":["User"],"properties":{"name":"Jim","age":42}},{"type":"node","id":"2","labels":["User"],"properties":{"age":12}}]}

Export queryMap. 

CALL apoc.export.json.query("MATCH (u:User)-[r:KNOWS]->(d:User) RETURN u {.*}, d {.*}, r {.*}","/tmp/map.json",{params:{age:10}})

result:

{"u":{"born":"2015-07-04T19:32:24","name":"Adam","place":{"crs":"wgs-84","latitude":33.46789,"longitude":13.1,"height":null},"age":42,"male":true,"kids":["Sam","Anna","Grace"]},"d":{"name":"Jim","age":42},"r":{"since":1993}}

Export all. 

CALL apoc.export.json.all("/tmp/all.json",{useTypes:true})

result:

{"type":"node","id":"0","labels":["User"],"properties":{"born":"2015-07-04T19:32:24","name":"Adam","place":{"crs":"wgs-84","latitude":33.46789,"longitude":13.1,"height":null},"age":42,"male":true,"kids":["Sam","Anna","Grace"]}}
{"type":"node","id":"1","labels":["User"],"properties":{"name":"Jim","age":42}}
{"type":"node","id":"2","labels":["User"],"properties":{"age":12}}
{"id":"0","type":"relationship","label":"KNOWS","properties":{"since":1993},"start":{"id":"0","labels":["User"]},"end":{"id":"1","labels":["User"]}}

Export graph. 

CALL apoc.graph.fromDB('test',{}) yield graph
CALL apoc.export.json.graph(graph,"tmp/graph.json",{})
YIELD nodes, relationships, properties, file, source,format, time
RETURN *

result:

{"type":"node","id":"0","labels":["User"],"properties":{"born":"2015-07-04T19:32:24","name":"Adam","place":{"crs":"wgs-84","latitude":33.46789,"longitude":13.1,"height":null},"age":42,"male":true,"kids":["Sam","Anna","Grace"]}}
{"type":"node","id":"1","labels":["User"],"properties":{"name":"Jim","age":42}}
{"type":"node","id":"2","labels":["User"],"properties":{"age":12}}
{"id":"0","type":"relationship","label":"KNOWS","properties":{"since":1993},"start":{"id":"0","labels":["User"]},"end":{"id":"1","labels":["User"]}}

Export data. 

MATCH (nod:User)
MATCH ()-[rels:KNOWS]->()
WITH collect(nod) as a, collect(rels) as b
CALL apoc.export.json.data(a, b, "tmp/data.json", null)
YIELD nodes, relationships, properties, file, source,format, time
RETURN *

result:

{"type":"node","id":"0","labels":["User"],"properties":{"born":"2015-07-04T19:32:24","name":"Adam","place":{"crs":"wgs-84","latitude":33.46789,"longitude":13.1,"height":null},"age":42,"male":true,"kids":["Sam","Anna","Grace"]}}
{"type":"node","id":"1","labels":["User"],"properties":{"name":"Jim","age":42}}
{"type":"node","id":"2","labels":["User"],"properties":{"age":12}}
{"id":"0","type":"relationship","label":"KNOWS","properties":{"since":1993},"start":{"id":"0","labels":["User"]},"end":{"id":"1","labels":["User"]}}

Export query with config writeNodeProperties

CALL apoc.export.json.query("MATCH p = (u:User)-[rel:KNOWS]->(u2:User) RETURN rel","/tmp/writeNodeProperties.json",{writeNodeProperties:true}})

result:

{"rel":{"id":"0","type":"relationship","label":"KNOWS","properties":{"since":1993},"start":{"id":"0","labels":["User"],"properties":{"born":"2015-07-04T19:32:24","name":"Adam","place":{"crs":"wgs-84","latitude":33.46789,"longitude":13.1,"height":null},"age":42,"male":true,"kids":["Sam","Anna","Grace"]}},"end":{"id":"1","labels":["User"],"properties":{"name":"Jim","age":42}}}}

7.9. Export to Cypher Script

If you are experimenting with imports that are failing you can add the --debug command line parameter, to see which statement was executed last and cause the failure.

Also check the memory configuration of your Neo4j instance, you might want to up the HEAP to 2–4GB with the dbms.memory.heap.max_size=2G setting in neo4j.conf.

And provide more memory to cypher-shell itself by prefixing the command with: JAVA_OPTS=-Xmx4G bin/cypher-shell …

Make sure to set the config options in your neo4j.conf

neo4j.conf. 

apoc.export.file.enabled=true
apoc.import.file.enabled=true

Data is exported as Cypher statements to the given file.

It is possible to choose between three export formats:

  • neo4j-shell: for Neo4j Shell and partly apoc.cypher.runFile
  • cypher-shell: for Cypher shell
  • plain: doesn’t output begin / commit / await just plain Cypher

You can also use the Optimizations like: useOptimizations: {config} Config could have this params:

  • unwindBatchSize: (default 100)
  • type: possible values ('NONE', 'UNWIND_BATCH', 'UNWIND_BATCH_PARAMS') (default 'UNWIND_BATCH')

With NONE it will export the file with CREATE statement;

With 'UNWIND_BATCH` it will export the file by batching the entities with the UNWIND method as explained in this article.

To change the export format, you have to set it on the config params like {format : "cypher-shell"}.

By default the format is neo4j-shell.

If you want to export to separate files, e.g. to later use the apoc.cypher.runFiles/runSchemaFiles procedures, you can add separateFiles:true.

It is possible to choose between four cypher update operation types: To change the cypher update operation, you have to set it on the config params like {cypherFormat: "updateAll"}

  • create: all CREATE
  • updateAll: MERGE instead of CREATE
  • addStructure: MATCH for nodes + MERGE for rels
  • updateStructure: MERGE + MATCH for nodes and rels

Format and cypherFormat can be used both in the same query giving you complete control over the exact export format:

call apoc.export.cypher.query(
"MATCH (p1:Person)-[r:KNOWS]->(p2:Person) RETURN p1,r,p2",
"/tmp/friendships.cypher",
{format:'plain',cypherFormat:'updateStructure'})`

YIELD file, source, format, nodes, relationships, properties, time

apoc.export.cypher.all(file,config)

exports whole database incl. indexes as Cypher statements to the provided file

apoc.export.cypher.data(nodes,rels,file,config)

exports given nodes and relationships incl. indexes as Cypher statements to the provided file

apoc.export.cypher.graph(graph,file,config)

exports given graph object incl. indexes as Cypher statements to the provided file

apoc.export.cypher.query(query,file,config)

exports nodes and relationships from the Cypher statement incl. indexes as Cypher statements to the provided file

apoc.export.cypher.schema(file,config)

exports all schema indexes and constraints to cypher

The labels exported are ordered alphabetically. The output of labels() function is not sorted, use it in combination with apoc.coll.sort().

7.9.1. Roundtrip Example

You can use this roundtrip example e.g. on the :play movies movie graph.

Make sure to set the config options in your neo4j.conf

neo4j.conf. 

apoc.export.file.enabled=true
apoc.import.file.enabled=true

Export the data in plain format and multiple files:

call apoc.export.cypher.query("match (n)-[r]->(n2) return * limit 100",
 "/tmp/mysubset.cypher",
 {format:'plain',separateFiles:true});

This should result in 4 files in your directory.

ls -1 /tmp/mysubset.*
/tmp/mysubset.cleanup.cypher
/tmp/mysubset.nodes.cypher
/tmp/mysubset.relationships.cypher
/tmp/mysubset.schema.cypher

Import the data in 4 steps, first the schema, then nodes and relationships, then cleanup.

call apoc.cypher.runSchemaFile('/tmp/mysubset.schema.cypher'); call apoc.cypher.runFiles(['/tmp/mysubset.nodes.cypher','/tmp/mysubset.relationships.cypher']);

call apoc.cypher.runFile('/tmp/mysubset.cleanup.cypher'); call apoc.cypher.runSchemaFile('/tmp/mysubset.cleanup.cypher'); ---

The run* procedures have some optional config:

  • {statistics:true/false} to output a row of update-stats per statement, default is true
  • {timeout:1 or 10} for how long the stream waits for new data, default is 10

7.9.2. Stream back Exported Cypher Script as columns

If you leave off the file-name as null the export will instead be streamed back.

In general there will be a cypherStatements column with the script.

If you use multi-file-splitting as configuration parameter, there will be extra columns with content for

  • nodeStatements
  • relationshipStatements
  • cleanupStatements
  • schemaStatements

If you also specify the streamStatements:true then each batch (by batchSize which defaults to 10k) of statements will be returned as a row. So you can use your client to reconstruct the cypher script.

Simple Example for Streaming. 

echo "
CALL apoc.export.cypher.all(null,{streamStatements:true,batchSize:100}) YIELD cypherStatements RETURN cypherStatements;
" | ./bin/cypher-shell --non-interactive --format plain

7.9.3. Examples

7.9.3.1. Old method:

exportAll (neo4j-shell format). Without the optimizations

CALL apoc.export.cypher.all({fileName},{config})

Result:

begin
CREATE (:Foo:`UNIQUE IMPORT LABEL` {name:"foo", `UNIQUE IMPORT ID`:0});
CREATE (:Bar {name:"bar", age:42});
CREATE (:Bar:`UNIQUE IMPORT LABEL` {age:12, `UNIQUE IMPORT ID`:2});
commit
begin
CREATE INDEX ON :Foo(name);
CREATE CONSTRAINT ON (node:Bar) ASSERT node.name IS UNIQUE;
CREATE CONSTRAINT ON (node:`UNIQUE IMPORT LABEL`) ASSERT node.`UNIQUE IMPORT ID` IS UNIQUE;
commit
schema await
begin
MATCH (n1:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`:0}), (n2:Bar{name:"bar"}) CREATE (n1)-[:KNOWS]->(n2);
commit
begin
MATCH (n:`UNIQUE IMPORT LABEL`)  WITH n LIMIT 20000 REMOVE n:`UNIQUE IMPORT LABEL` REMOVE n.`UNIQUE IMPORT ID`;
commit
begin
DROP CONSTRAINT ON (node:`UNIQUE IMPORT LABEL`) ASSERT node.`UNIQUE IMPORT ID` IS UNIQUE;
commit

exportSchema (neo4j-shell format). 

CALL apoc.export.cypher.schema({fileName},{config})

Result:

begin
CREATE INDEX ON :Foo(name);
CREATE CONSTRAINT ON (node:Bar) ASSERT node.name IS UNIQUE;
commit
schema await

7.9.3.2. New method:

With the optimizations

CALL apoc.export.cypher.all({fileName},{config})

Result:

BEGIN
CREATE INDEX ON :Bar(first_name,last_name);
CREATE INDEX ON :Foo(name);
CREATE CONSTRAINT ON (node:Bar) ASSERT node.name IS UNIQUE;
CREATE CONSTRAINT ON (node:`UNIQUE IMPORT LABEL`) ASSERT node.`UNIQUE IMPORT ID` IS UNIQUE;
COMMIT
SCHEMA AWAIT
BEGIN
UNWIND [{_id:3, properties:{age:12}}] as row
CREATE (n:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`: row._id}) SET n += row.properties SET n:Bar;
UNWIND [{_id:2, properties:{age:12}}] as row
CREATE (n:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`: row._id}) SET n += row.properties SET n:Bar:Person;
UNWIND [{_id:0, properties:{born:date('2018-10-31'), name:"foo"}}, {_id:4, properties:{born:date('2017-09-29'), name:"foo2"}}] as row
CREATE (n:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`: row._id}) SET n += row.properties SET n:Foo;
UNWIND [{name:"bar", properties:{age:42}}, {name:"bar2", properties:{age:44}}] as row
CREATE (n:Bar{name: row.name}) SET n += row.properties;
UNWIND [{_id:6, properties:{age:99}}] as row
CREATE (n:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`: row._id}) SET n += row.properties;
COMMIT
BEGIN
UNWIND [{start: {_id:0}, end: {name:"bar"}, properties:{since:2016}}, {start: {_id:4}, end: {name:"bar2"}, properties:{since:2015}}] as row
MATCH (start:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`: row.start._id})
MATCH (end:Bar{name: row.end.name})
CREATE (start)-[r:KNOWS]->(end) SET r += row.properties;
COMMIT
BEGIN
MATCH (n:`UNIQUE IMPORT LABEL`)  WITH n LIMIT 20000 REMOVE n:`UNIQUE IMPORT LABEL` REMOVE n.`UNIQUE IMPORT ID`;
COMMIT
BEGIN
DROP CONSTRAINT ON (node:`UNIQUE IMPORT LABEL`) ASSERT (node.`UNIQUE IMPORT ID`) IS UNIQUE;
COMMIT