This site is not optimized for Internet Explorer 9 and lower. Please choose another browser or upgrade your existing browser in order get the best experience of this website.

Thinking in Patterns in Neo4j with Cypher

March 18, 2016

Thinking In Patterns with Neo4j on GraphGridThinking in patterns is the key to interacting with a graph database like Neo4j. One of the main challenges I see with those with deep relational database experience when transitioning to a graph database is the use of a relational approach for querying data. To query a graph database most efficiently there is a need to update the mental model for how database query interactions are approached. We’ll look at some examples of this and making this transition to thinking in patterns.

The overuse of relational query techniques most often manifests itself in a tendency to use WHERE clauses exclusively for filtering and comparisons from multiple complete sets of nodes, rather than enabling Neo4j to begin ignoring nodes as it expands the starting set in the MATCH clause. The goal of querying in the Neo4j graph database should be to get to the smallest starting set as quickly as possible to maximize the benefits of constant-time, index-free adjacency traversals within the local network around each starting node.

Thinking in Patterns Starts at Data Modeling

In order to query Neo4j in a pattern-centric manner that is sympathetic to the data layout the data model must consider these patterns that are important. One key in modeling the data is to know that each relationship off a node is literally a memory pointer to another node and the relationships around a node are grouped by their type. This allows constant time traversal and targeting from one node to a set of nodes all connected by a single type. Let’s look at an example…

Assuming we want to see individuals from Wooster, Ohio that were actors in a movie and see if any of them worked with any of the same directors. The non-normalized RDBMS approach to model this could be putting isActor, isDirector, city, state and movies properties on the Person node. Here’s a bit of an extreme example of this could look:

MATCH (actor:Person) WHERE actor.isActor = true AND actor.state = “Ohio” and actor.city = “Wooster”
WITH actor, actor.movies AS movies UNWIND movies AS movie 
MATCH (director:Person) WHERE director.isDirector = true AND movie IN director.movies
RETURN director, collect(person) AS persons;

The issue with such approach is that it requires you to go through each node within the Person label to find the intersection of the values within the movies array for the Person nodes that have been determined to be actors from Wooster, Ohio or directors.

A more graph friendly and contextually specific approach would be to realize that the movies really should be their own Node and connect each Person that is an actor to the Movies in which they acted via an ACTED_IN relationship and the lookup query would be:

MATCH (ohio:State {name: “Ohio”})<-[:APART_OF]-(wooster:City {name: “Wooster”})<-[:LIVES_IN]-(actor:Person)-[:ACTED_IN]->(m)<-[:DIRECTED]-(director:Person)-[:DIRECTED]->(m2)<-[:ACTED_IN]-(a2)-[:LIVES_IN]->(wooster)
RETURN person;

Informative and Flexible Patterns

Expanding out the contextually meaningful pieces of information such as the Movies to make them their own Node entity within the graph allows many significant patterns to be built involving them and take advantage of the traversal performance where Neo4j excels. By utilizing these patterns and getting to the smallest possible starting set as quickly as possible complex relationships and patterns can be leveraged to build incredibly meaningful returns much more quickly than staring with two large sets of entities and looking across all of them for some intersection based on many property checks.