7.2. Load JSON - Chapter 7. Export / Import

7.2.1. Load JSON

Web APIs are a huge opportunity to access and integrate data from any sources with your graph. Most of them provide the data as JSON.

With apoc.load.json you can retrieve data from URLs and turn it into map value(s) for Cypher to consume. Cypher is pretty good at deconstructing nested documents with dot syntax, slices, UNWIND etc. so it is easy to turn nested data into graphs.

Sources with multiple JSON objects (JSONL,JSON Lines) in a stream are also supported, like the streaming Twitter format or the Yelp Kaggle dataset.

7.2.2. Json-Path

Most of the apoc.load.json and apoc.convert.*Json procedures and functions now accept a json-path as last argument.

The json-path uses the Java implementation by Jayway of Stefan Gössners JSON-Path

Here is some syntax, there are more examples at the links above.

$.store.book[0].title

Table 7.1. Operators
Operator	Description
`$`	The root element to query. This starts all path expressions.
`@`	The current node being processed by a filter predicate.
`*`	Wildcard. Available anywhere a name or numeric are required.
`..`	Deep scan. Available anywhere a name is required.
`.<name>`	Dot-notated child
`['<name>' (,'<name>')]`	Bracket-notated child or children
`[<number> (,<number>)]`	Array index or indexes
`[start:end]`	Array slice operator
`[?(<expression>)]`	Filter expression. Expression must evaluate to a boolean value.

If used, this path is applied to the json and can be used to extract sub-documents and -values before handing the result to Cypher, resulting in shorter statements with complex nested JSON.

There is also a direct apoc.json.path(json,path) function.

To simplify the JSON URL syntax, you can configure aliases in conf/neo4j.conf:

apoc.json.myJson.url=https://api.stackexchange.com/2.2/questions?pagesize=100&order=desc&sort=creation&tagged=neo4j&site=stackoverflow&filter=!5-i6Zw8Y)4W7vpy91PMYsKM-k9yzEsSC1_Uxlf

CALL apoc.load.json('https://api.stackexchange.com/2.2/questions?pagesize=100&order=desc&sort=creation&tagged=neo4j&site=stackoverflow&filter=!5-i6Zw8Y)4W7vpy91PMYsKM-k9yzEsSC1_Uxlf')

becomes

CALL apoc.load.json('myJson')

The 3rd value in the apoc.json.<alias>.url= effectively defines an alias to be used in apoc.load.json('<alias>',….

7.2.3. Load JSON StackOverflow Example

There have been articles before about loading JSON from Web-APIs like StackOverflow.

With apoc.load.json it’s now very easy to load JSON data from any file or URL.

If the result is a JSON object is returned as a singular map. Otherwise if it was an array is turned into a stream of maps.

The URL for retrieving the last questions and answers of the neo4j tag is this:

https://api.stackexchange.com/2.2/questions?pagesize=100&order=desc&sort=creation&tagged=neo4j&site=stackoverflow&filter=!5-i6Zw8Y)4W7vpy91PMYsKM-k9yzEsSC1_Uxlf

Now it can be used from within Cypher directly, let’s first introspect the data that is returned.

JSON data from StackOverflow.

WITH "https://api.stackexchange.com/2.2/questions?pagesize=100&order=desc&sort=creation&tagged=neo4j&site=stackoverflow&filter=!5-i6Zw8Y)4W7vpy91PMYsKM-k9yzEsSC1_Uxlf" AS url
CALL apoc.load.json(url) YIELD value
UNWIND value.items AS item
RETURN item.title, item.owner, item.creation_date, keys(item)

Question authors from StackOverflow using json-path.

WITH "https://api.stackexchange.com/2.2/questions?pagesize=100&order=desc&sort=creation&tagged=neo4j&site=stackoverflow&filter=!5-i6Zw8Y)4W7vpy91PMYsKM-k9yzEsSC1_Uxlf" AS url
CALL apoc.load.json(url,'$.items.owner.name') YIELD value
RETURN name, count(*);

Combined with the cypher query from the original blog post it’s easy to create the full Neo4j graph of those entities. We filter the original poster last, b/c deleted users have no user_id anymore.

Graph data created via loading JSON from StackOverflow.

WITH "https://api.stackexchange.com/2.2/questions?pagesize=100&order=desc&sort=creation&tagged=neo4j&site=stackoverflow&filter=!5-i6Zw8Y)4W7vpy91PMYsKM-k9yzEsSC1_Uxlf" AS url
CALL apoc.load.json(url) YIELD value
UNWIND value.items AS q
MERGE (question:Question {id:q.question_id}) ON CREATE
  SET question.title = q.title, question.share_link = q.share_link, question.favorite_count = q.favorite_count

FOREACH (tagName IN q.tags | MERGE (tag:Tag {name:tagName}) MERGE (question)-[:TAGGED]->(tag))
FOREACH (a IN q.answers |
   MERGE (question)<-[:ANSWERS]-(answer:Answer {id:a.answer_id})
   MERGE (answerer:User {id:a.owner.user_id}) ON CREATE SET answerer.display_name = a.owner.display_name
   MERGE (answer)<-[:PROVIDED]-(answerer)
)
WITH * WHERE NOT q.owner.user_id IS NULL
MERGE (owner:User {id:q.owner.user_id}) ON CREATE SET owner.display_name = q.owner.display_name
MERGE (owner)-[:ASKED]->(question)

7.2.4. Load JSON from Twitter (with additional parameters)

With apoc.load.jsonParams you can send additional headers or payload with your JSON GET request, e.g. for the Twitter API:

Configure Bearer and Twitter Search Url token in neo4j.conf

apoc.static.twitter.bearer=XXXX
apoc.static.twitter.url=https://api.twitter.com/1.1/search/tweets.json?count=100&result_type=recent&lang=en&q=

Twitter Search via Cypher.

CALL apoc.static.getAll("twitter") yield value AS twitter
CALL apoc.load.jsonParams(twitter.url + "oscon+OR+neo4j+OR+%23oscon+OR+%40neo4j",{Authorization:"Bearer "+twitter.bearer},null) yield value
UNWIND value.statuses as status
WITH status, status.user as u, status.entities as e
RETURN status.id, status.text, u.screen_name, [t IN e.hashtags | t.text] as tags, e.symbols, [m IN e.user_mentions | m.screen_name] as mentions, [u IN e.urls | u.expanded_url] as urls

7.2.5. GeoCoding Example

Example for reverse geocoding and determining the route from one to another location.

WITH
        "21 rue Paul Bellamy 44000 NANTES FRANCE" AS fromAddr,
        "125 rue du docteur guichard 49000 ANGERS FRANCE" AS toAddr

call apoc.load.json("http://www.yournavigation.org/transport.php?url=http://nominatim.openstreetmap.org/search&format=json&q=" + replace(fromAddr, ' ', '%20')) YIELD value AS from

WITH from, toAddr  LIMIT 1

call apoc.load.json("http://www.yournavigation.org/transport.php?url=http://nominatim.openstreetmap.org/search&format=json&q=" + replace(toAddr, ' ', '%20')) YIELD value AS to

CALL apoc.load.json("https://router.project-osrm.org/viaroute?instructions=true&alt=true&z=17&loc=" + from.lat + "," + from.lon + "&loc=" + to.lat + "," + to.lon ) YIELD value AS doc

UNWIND doc.route_instructions as instruction

RETURN instruction