This site is not optimized for Internet Explorer 9 and lower. Please choose another browser or upgrade your existing browser in order get the best experience of this website.

When Your Data Is Not a Graph

March 11, 2016

When your data is not a graph or not only a graphI often get asked at the Neo4j trainings and meetups about which types of data or use cases a graph database doesn’t handle. While graph data structure models the world we live in exceptionally well there are some use cases and scenarios where your data is not a graph – or more likely not ONLY a graph.

The Neo4j graph database is used for many use cases and influences the situations of current world leaders by representing effectively how these are connected, allow fraud rings and networks to be surfaced through their common connections, enables business analyst to understand the relationships within their data for better business insights, and help users increase their chances of finding pertinent documents within a network.

Any of these connected data examples benefits tremendously from a native graph database like Neo4j. At the same time, there are other scenarios where your data is not a graph.

Not Only a Graph Rather than Not a Graph

Here are a examples to help you think through data and understand if you’re dealing with data that would benefit from being represented as a graph:

  • When data entities have no contextual importance via their connections with other data entities
    For instance, if you’re building some kind of calculator, the housing medium for your numbers, equations, and base data won’t likely be taking advantage of powerful contextual relationships.
    Another scenario could if you’re tracking your personal budget each month and simply want to see if you spent more than you made at the end because in this case your goal isn’t to understand the relationships of where that money was spent.
    However, as soon as you want to be able to ask questions about the items in that budget and see more details about the expenditure and which person within your household executed the transaction, where it occurred and whom they were with at the time. Now you’ve got a graph problem.
  • When data involves large text files, web pages, or JSON documents
    If your scenario requires bulk storage and direct lookup retrieval of large string data without any requirement on the connections between documents then you’ll be better of using a pure document store.
    For the more likely scenario where you have large text documents where the connections between them are important such as an index of web pages and knowing which ones link to which other ones, then you’ll be in the situation where your data is not only graph. This scenario can be solved quite well be representing all the linking connectedness within Neo4j using a reference to the large text document and pairing it with an object store like S3 or another document based stored for the actual bulk storage of the text document.

There may be other examples that come to your mind and I’d be happy to discuss those in the comments.

A good way to assess the need for a graph database is to begin to think about the things you want to know of your data and see if you find your self asking relationship-centric questions such as, “how is this person related to this account or this transaction?” then you should consider a graph database like Neo4j.