This site is not optimized for Internet Explorer 9 and lower. Please choose another browser or upgrade your existing browser in order get the best experience of this website.

Getting Acquainted with an Unknown Graph

March 09, 2016

Getting Acquainted with an Unknown Graph in Neo4jFinding your way around an unknown graph can seem a bit ambiguous at first due to Neo4j being schema-free. Especially if you’re newer to graph databases and used to a relational database where you would simply open the ERD and have a look through the tables. Just because Neo4j is schema-free doesn’t mean that schema-like elements are not present. The Neo4j graph database schema elements are composed of Label Names, Relationship Types, Indexes and Constraints on Property Keys. Let’s look at some techniques for getting aquatinted with an unknown graph.

Initial Unknown Graph Exploration

Here are a few quick tips to help build out the initial mental model of connections within the graph to get you started:

  • To observe the graph schema, the easiest area to look into is the browser panel in Neo4j. From there, you’ll be able to observe Label Names, Relationship Types and Property Keys. Each one can be clicked and will immediately load a maximum of 25 associated results. These results can provide a basic starting point to help you navigate your way through the graph.
  • To understand the Indexes and Constraints applied to the graph database, which will begin to shape some of the business rules around the data in the graph, you’ll want to execute the query, “:schema” within the browser to observe all the index and constraint rules for the Labels and Relationships with their respective Property Keys.
  • To gain a better understanding of the amount of nodes in the graph as a whole or within a certain Label so you know the data size you’re handling, run MATCH n RETURN COUNT(n); or MATCH (n:LabelNameHere) RETURN COUNT(n); respectively.

Cypher Functions for Exploring an Unknown Graph

Several important Cypher functions that you can utilize as you resume your exploration are: labels(), type() and keys(). Note in the examples below that the keyword DISTINCT removes duplicate values from the return set:

  • labels() has node reference as the argument, and returns the labels that are on such node.
    Example: MATCH (n) RETURN DISTINCT LABELS(n);
    IMPORTANT: Unless your graph is very small it is better to use labels() in a more focused way instead of across the entire graph.
  • type() takes a relationship reference as an argument and returns the relationship type of a relationship connecting two nodes.
    Example: MATCH (n:Person)-[r]-() RETURN DISTINCT TYPE(r);
  • keys() takes in a node reference as the argument and returns the property keys for the properties that are on such node.
    Example: MATCH (n:Person) RETURN DISTINCT keys(n);

Discovering Unknown Graph Connections

Now that we’ve looked a few of the basic building blocks for exploring and understanding the graph let’s put it all together by looking at an example of how we could start with a specific label that piques our interest and examine how and to what it is connected.
MATCH (p:Person)-[r]-(x)
RETURN p.name, COLLECT(DISTINCT type(r)) AS relationships, 
ID(x) AS id, LABELS(x) AS labels, KEYS(x) AS properties;

In the example above we’ll assume we have a Person that we’ve figured out has a name property and we’ll return all the Relationship Types to any of it’s immediate connections. Then for those immediate connections we want to return the id to anchor the return for the result rows correctly between p and x. Finally we’re including the Labels and Property Keys that are on each of the immediate connections.

In addition to now understanding how to explore an unknown graph, the main take away is to realize, “schema-free” doesn’t mean that no schema exists in the Neo4j graph database. Rather, it means that instead of defining a complete and strictly enforced schema at the outset for all data in the graph, only Indexes and Constraints are defined to provide some basic rules about how the data should be written and the rest of the graph schema evolves over time as data is being written within the graph database.