This site is not optimized for Internet Explorer 9 and lower. Please choose another browser or upgrade your existing browser in order get the best experience of this website.

Data Validation and Testing Your Graph Data State

February 17, 2016

Data validation for Neo4j graph data state using PostmanData validation lets you gain insight on the quality of your data assets. This involves grading your organization consistently to monitor your progress. When testing data, it’s essential to set metrics, as well as succeeding steps and goals to drive improvements. Data testing is even more crucial when loading data into a schema free graph database like Neo4j. So how do we it efficiently and continuously?

Schema-Free Nature of Neo4j and Data Validation

Neo4j is schema-free by nature, but does provide some schema concepts that can be enforced. This means, when your data flows via your Neo4j data pipeline and graph, there won’t be enforced constraints on data type. This also means Neo4j will try to pick the best data type when a property is being written if it isn’t specifically enforced for variations in numerical precision and all numerical values that are desired to be stored as strings. So if you happen to load data into Neo4j using LOAD CSV and you write a property consisting only of numerical value and want it stored as a string, then it’s essential you always wrap it in the Cypher toString() function to ensure you won’t end up with properties consisting of varying data types.

Data Validation with Postman, REST Requests, and Newman

For large scale automated data validation it’s beneficial to make use of a REST-client like Postman to create a test collection and validation requests that can run across the graph as new data flows into your Neo4j graph database to ensure it remains in a valid data state.

The Neo4j graph database features a REST api which can be utilized to query the graph. This can be to create a collection of REST requests that query the graph using Cypher with data validation questions like, “Does every Actor have an ACTED_IN relationship to a Movie?” which, when using Cypher, would appear as:
MATCH (a:Actor) WHERE NOT (a)-[:ACTED_IN]->() RETURN COUNT(DISTINCT a) AS count;

The test of the response in Postman would validate the count coming back as 0. Assuming there’s a rule saying, “Every Actor must have an ACTED_IN relationship to be a valid Actor,” then you’ll now have a test that would verify it.

Newman is a command-line collection runner for Postman. It lets you run and test a Postman collection straight from the command line. It’s made so you can assimilate it with your integration server and build systems. Newman consistently keeps a feature parity with Postman and lets you execute collections the way they’re carried out within the collection runner in Postman.

Data validation is an important topic when it comes to databases. Since data is frequently updated, queried, deleted, or passed around, having valid data is critical. By enforcing data validation and testing, databases will be more consistent, operational, and offer more value to the user.