This site is not optimized for Internet Explorer 9 and lower. Please choose another browser or upgrade your existing browser in order get the best experience of this website.

MySQL to Neo4J

August 30, 2015

MySqlToNeo4j You’ve probably heard that an effective way to take move data from an existing relational database to graph is using LOAD CSV. But what exactly does the process of converting all or part of the database tables from MySQL to Neo4j using LOAD CSV involve start to finish? We’ll be using the Mysql5 Northwind database as our example. There is a Neo4j tutorial that has a similar explanation using Postgres and discusses the graph modeling aspects as well. So definitely good to read through that. Here we’ll focus on MySQL and the CSV export in preparation for the Neo4j import.

First we’ll install and connect to the MySQL database:

$ brew install mysql
$ mysql.server restart

*Note: We’re skipping all MySQL server security because for this demonstration its simply an intermediary to get the data we need for the Neo4j LOAD CSV process.

Now using freely available MySQL Workbench or Sequel Pro connect to your localhost MySQL server. You should be able to do this directly on 127.0.0.1 without any username or password because we skipped the normal process of securing the server.

Import the Northwind.MySQL5.sql that you downloaded above. If you’re using Sequel Pro, you do this by choosing File -> Import… -> browse to your download and select Northwind.MySQL5.sql
When the import is finished you’ll see all the tables available for export to Neo4j. The specific tables we are interested in for our Neo4j graph model are Categories, Customers, Order Details, Orders, Products and Suppliers.

Export each table with right + click and selecting Export -> As csv file.
Customize the CSV file with settings that import smoothly into Neo4j (most should be selected by default):
1. NULL fields should export as a blank because it’s more efficient validate an actual existence or IS NULL check rather than actually creating the property with the literal string value “NULL” as value.
2. Escape values such as quotes with \ so quotes in the middle of the field do not break the CSV structure.
*Note: If you are planning to use dot notation to access columns by name, then you’ll need to make sure to remove any spaces from the column names in the first row of the CSV files before attempting to import into Neo4j.

Now using the latest (2.2.5 as of this article) Neo4j Community Edition, you can continue to follow along with the Cypher below. To import data into Neo4j locally, launch the Neo4j shell by navigating to the installation directory using terminal and launching ./bin/neo4j-shell
Before you copy and paste the Cypher below into the shell to import each one of the CSV files created by each exported table, you’ll need to update the “file://…” paths to match your export location.

If you aren’t using the shell and you prefer to use the Neo4j browser, then you’ll need to execute one statement at a time. Statements are terminated by a semicolon.

// Create Constraints
CREATE CONSTRAINT ON (p:Product) ASSERT p.productID IS UNIQUE;
CREATE CONSTRAINT ON (c:Category) ASSERT c.categoryID IS UNIQUE;
CREATE CONSTRAINT ON (s:Supplier) ASSERT s.supplierID IS UNIQUE;
CREATE CONSTRAINT ON (c:Customer) ASSERT c.customerID IS UNIQUE;
CREATE CONSTRAINT ON (o:Order) ASSERT o.orderID IS UNIQUE;

// Create Indexes
CREATE INDEX ON :Product(productName);
CREATE INDEX ON :Category(categoryName);
CREATE INDEX ON :Supplier(companyName);
CREATE INDEX ON :Supplier(contactName);
CREATE INDEX ON :Supplier(city);
CREATE INDEX ON :Supplier(country);
CREATE INDEX ON :Supplier(postalCode);
CREATE INDEX ON :Customer(companyName);
CREATE INDEX ON :Customer(contactName);
CREATE INDEX ON :Customer(city);
CREATE INDEX ON :Customer(country);
CREATE INDEX ON :Customer(postalCode);
CREATE INDEX ON :Order(customerID);
CREATE INDEX ON :Order(shipName);
CREATE INDEX ON :Order(shipCity);
CREATE INDEX ON :Order(shipCountry);
CREATE INDEX ON :Order(shipPostalCode);

// Load Products
LOAD CSV WITH HEADERS FROM "file://" AS row
CREATE (n:Product)
SET n = row,
  n.unitPrice = toFloat(row.unitPrice),
  n.unitsInStock = toInt(row.unitsInStock), n.unitsOnOrder = toInt(row.unitsOnOrder),
  n.reorderLevel = toInt(row.reorderLevel), n.discontinued = (toInt(row.discontinued) <> 0);

// Load Categories
LOAD CSV WITH HEADERS FROM "file://" AS row
CREATE (n:Category)
SET n = row;

// Load Suppliers
LOAD CSV WITH HEADERS FROM "file://" AS row
CREATE (n:Supplier)
SET n = row;

// Create Product to Category Relationships
MATCH (p:Product),(c:Category)
WHERE p.categoryID = c.categoryID
CREATE (p)-[:PART_OF]->(c);

// Create Product to Supplier Relationships
MATCH (p:Product),(s:Supplier)
WHERE p.supplierID = s.supplierID
CREATE (s)-[:SUPPLIES]->(p);

// Load Customers
LOAD CSV WITH HEADERS FROM "file://" AS row
CREATE (n:Customer)
SET n = row;

// Load Orders
LOAD CSV WITH HEADERS FROM "file://" AS row
CREATE (n:Order)
SET n = row;

// Create Customer to Order Relationships
MATCH (c:Customer),(o:Order)
WHERE c.customerID = o.customerID
CREATE (c)-[:PURCHASED]->(o);

// Load Order Details
LOAD CSV WITH HEADERS FROM "file://" AS row
MATCH (p:Product), (o:Order)
WHERE p.productID = row.productID AND o.orderID = row.orderID
CREATE (o)-[details:ORDERS]->(p)
SET details = row,
  details.quantity = toInt(row.quantity);

To validate the import worked successfully you can use the Neo4j Cypher queries below that are meant to ask “graph questions” of the data now that it it has been imported into the Neo4j graph model.

// Which suppliers have the broadest category coverage?
MATCH (s:Supplier)-->(:Product)-->(c:Category)
WITH s.companyName as Company, collect(distinct c.categoryName) as Categories
WITH Company, Categories, length(Categories) AS Size ORDER BY Size DESC
RETURN Company, Size, Categories;

// Which suppliers offer Produce?
MATCH (c:Category {categoryName:"Produce"})<--(:Product)<--(s:Supplier)
RETURN DISTINCT s.companyName as ProduceSuppliers;

// Which customer has bought the most produce?
MATCH (cust:Customer)-[:PURCHASED]->(:Order)-[o:ORDERS]->(p:Product),
      (p)-[:PART_OF]->(c:Category {categoryName:"Produce"})
RETURN DISTINCT cust.contactName as CustomerName, SUM(o.quantity) AS TotalProductsPurchased ORDER BY TotalProductsPurchased DESC;

As you go from here and use LOAD CSV to import data into Neo4j a couple of useful references can be found in the Neo4j manual: Load CSV and Importing CSV files with cypher. As always if you have any questions or comments leave them below and we’ll try to respond promptly.