Neo4j: Loading JSON documents with Cypher
One of the most commonly asked questions I get asked is how to load JSON documents into Neo4j and although Cypher doesn’t have a ‘LOAD JSON’ command we can still get JSON data into the graph.
Michael shows how to do this from various languages in this blog post and I recently wanted to load a JSON document that I generated from Chicago crime types.
This is a snippet of the JSON document:
{ "categories": [ { "name": "Index Crime", "sub_categories": [ { "code": "01A", "description": "Homicide 1st & 2nd Degree" } ] }, { "name": "Non-Index Crime", "sub_categories": [ { "code": "01B", "description": "Involuntary Manslaughter" } ] }, { "name": "Violent Crime", "sub_categories": [ { "code": "01A", "description": "Homicide 1st & 2nd Degree" } ] } ] }
We want to create the following graph structure from this document:
We can then connect the crimes to the appropriate sub category and write aggregation queries that drill down from the category.
To do this we’re going to have to pass the JSON document to Neo4j via its HTTP API rather than through the browser. Luckily there are drivers available for {insert your favourite language here} so we should still be good.
Python is my current goto language so I’m going to use py2neo to load the data in.
Let’s start by writing a simple query which passes our JSON document in and gets it straight back. Note that I’ve updated my Neo4j password to be ‘foobar’ – replace that with your equivalent if you’re following along:
import json from py2neo import Graph, authenticate # replace 'foobar' with your password authenticate("localhost:7474", "neo4j", "foobar") graph = Graph() with open('categories.json') as data_file: json = json.load(data_file) query = """ RETURN {json} """ # Send Cypher query. print graph.cypher.execute(query, json = json)
$ python import_categories.py | document ---+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1 | {u'categories': [{u'name': u'Index Crime', u'sub_categories': [{u'code': u'01A', u'description': u'Homicide 1st & 2nd Degree'}, {u'code': u'02', u'description': u'Criminal Sexual Assault'}, {u'code': u'03', u'description': u'Robbery'}, {u'code': u'04A', u'description': u'Aggravated Assault'}, {u'code': u'04B', u'description': u'Aggravated Battery'}, {u'code': u'05', u'description': u'Burglary'}, {u'code': u'06', u'description': u'Larceny'}, {u'code': u'07', u'description': u'Motor Vehicle Theft'}, {u'code': u'09', u'description': u'Arson'}]}, {u'name': u'Non-Index Crime', u'sub_categories': [{u'code': u'01B', u'description': u'Involuntary Manslaughter'}, {u'code': u'08A', u'description': u'Simple Assault'}, {u'code': u'08B', u'description': u'Simple Battery'}, {u'code': u'10', u'description': u'Forgery & Counterfeiting'}, {u'code': u'11', u'description': u'Fraud'}, {u'code': u'12', u'description': u'Embezzlement'}, {u'code': u'13', u'description': u'Stolen Property'}, {u'code': u'14', u'description': u'Vandalism'}, {u'code': u'15', u'description': u'Weapons Violation'}, {u'code': u'16', u'description': u'Prostitution'}, {u'code': u'17', u'description': u'Criminal Sexual Abuse'}, {u'code': u'18', u'description': u'Drug Abuse'}, {u'code': u'19', u'description': u'Gambling'}, {u'code': u'20', u'description': u'Offenses Against Family'}, {u'code': u'22', u'description': u'Liquor License'}, {u'code': u'24', u'description': u'Disorderly Conduct'}, {u'code': u'26', u'description': u'Misc Non-Index Offense'}]}, {u'name': u'Violent Crime', u'sub_categories': [{u'code': u'01A', u'description': u'Homicide 1st & 2nd Degree'}, {u'code': u'02', u'description': u'Criminal Sexual Assault'}, {u'code': u'03', u'description': u'Robbery'}, {u'code': u'04A', u'description': u'Aggravated Assault'}, {u'code': u'04B', u'description': u'Aggravated Battery'}]}]}
It’s a bit ugly but we can see that everything’s there! Our next step is to extract each category into its own row. We can do this by accessing the ‘categories’ key in our JSON document and then calling the UNWIND function which allows us to expand a collection into a sequence of rows:
query = """ WITH {json} AS document UNWIND document.categories AS category RETURN category.name """
$ python import_categories.py | category.name ---+----------------- 1 | Index Crime 2 | Non-Index Crime 3 | Violent Crime
Now we can create a node for each of those categories. We’ll use the MERGE command so that we can run this script multiple times without ending up with repeat categories:
query = """ WITH {json} AS document UNWIND document.categories AS category MERGE (:CrimeCategory {name: category.name}) """
Let’s quickly check those categories were correctly imported:
match (category:CrimeCategory) return category
Looking good so far – now for the sub categories. We’re going to use the UNWIND function to help us out here as well:
query = """ WITH {json} AS document UNWIND document.categories AS category UNWIND category.sub_categories AS subCategory RETURN category.name, subCategory.code, subCategory.description """
$ python import_categories.py | category.name | subCategory.code | subCategory.description ----+-----------------+------------------+--------------------------- 1 | Index Crime | 01A | Homicide 1st & 2nd Degree 2 | Index Crime | 02 | Criminal Sexual Assault 3 | Index Crime | 03 | Robbery 4 | Index Crime | 04A | Aggravated Assault 5 | Index Crime | 04B | Aggravated Battery 6 | Index Crime | 05 | Burglary 7 | Index Crime | 06 | Larceny 8 | Index Crime | 07 | Motor Vehicle Theft 9 | Index Crime | 09 | Arson 10 | Non-Index Crime | 01B | Involuntary Manslaughter 11 | Non-Index Crime | 08A | Simple Assault 12 | Non-Index Crime | 08B | Simple Battery 13 | Non-Index Crime | 10 | Forgery & Counterfeiting 14 | Non-Index Crime | 11 | Fraud 15 | Non-Index Crime | 12 | Embezzlement 16 | Non-Index Crime | 13 | Stolen Property 17 | Non-Index Crime | 14 | Vandalism 18 | Non-Index Crime | 15 | Weapons Violation 19 | Non-Index Crime | 16 | Prostitution 20 | Non-Index Crime | 17 | Criminal Sexual Abuse 21 | Non-Index Crime | 18 | Drug Abuse 22 | Non-Index Crime | 19 | Gambling 23 | Non-Index Crime | 20 | Offenses Against Family 24 | Non-Index Crime | 22 | Liquor License 25 | Non-Index Crime | 24 | Disorderly Conduct 26 | Non-Index Crime | 26 | Misc Non-Index Offense 27 | Violent Crime | 01A | Homicide 1st & 2nd Degree 28 | Violent Crime | 02 | Criminal Sexual Assault 29 | Violent Crime | 03 | Robbery 30 | Violent Crime | 04A | Aggravated Assault 31 | Violent Crime | 04B | Aggravated Battery
Let’s give sub categories the MERGE treatment too:
query = """ WITH {json} AS document UNWIND document.categories AS category UNWIND category.sub_categories AS subCategory MERGE (c:CrimeCategory {name: category.name}) MERGE (sc:SubCategory {code: subCategory.code}) ON CREATE SET sc.description = subCategory.description MERGE (c)-[:CHILD]->(sc) """
And finally let’s write a query to check what we’ve imported:
match (category:CrimeCategory)-[:CHILD]->(subCategory) return *
I hadn’t realised before running this query is that some sub categories sit under multiple categories so that’s quite an interesting insight. The final Python script is available on github – any questions let me know.
Reference: | Neo4j: Loading JSON documents with Cypher from our WCG partner Mark Needham at the Mark Needham Blog blog. |