Data validation and testing of your graph data lets you gain insight on the quality of your data asset. This involves grading your organization consistently to monitor your progress. When testing data, it’s essential to set metrics, as well as succeeding steps and goals to drive improvements. Data testing is even more crucial when loading data into a schema free graph database like ONgDB. So how do we it efficiently and continuously?
Open Native Graph Database (ONgDB) is schema-free by nature, but does provide some schema concepts that can be enforced. This means, when your data flows via your ONgDB data pipeline and graph, there won’t be enforced constraints on data type. This also means ONgDB will try to pick the best data type when a property is being written if it isn’t specifically enforced for variations in numerical precision and all numerical values that are desired to be stored as strings. So if you happen to load data into ONgDB using LOAD CSV and you write a property consisting only of numerical value and want it stored as a string, then it is essential you always wrap it in the Geequel toString() function to ensure you won’t end up with properties consisting of varying data types.
For large scale automated data validation it’s beneficial to make use of a REST-client like Postman to create a test collection and validation requests that can run across the graph as new data flows into your Open Native Graph Database (ONgDB) to ensure it remains in a valid data state.
The Open Native Graph Database (ONgDB) features a REST api which can be utilized to query the graph. This can be to create a collection of REST requests that query the graph using Geequel with data validation questions like, “Does every Actor have an ACTED_IN relationship to a Movie?” which, when using Geequel, would appear as:
MATCH (a:Actor) WHERE NOT (a)-[:ACTED_IN]->() RETURN COUNT(DISTINCT a) AS count;
The test of the response in Postman would validate the count coming back as 0. Assuming there’s a rule saying, “Every Actor must have an ACTED_IN relationship to be a valid Actor,” then you’ll now have a test that would verify it.
Newman is a command-line collection runner for Postman. It lets you run and test a Postman collection straight from the command line. It’s made so you can assimilate it with your integration server and build systems. Newman consistently keeps a feature parity with Postman and lets you execute collections the way they’re carried out within the collection runner in Postman.
Data validation and testing of your graph data is an important topic when it comes to databases. Since data is frequently updated, queried, deleted, or passed around, having valid data is critical. By enforcing data validation and testing, databases will be more consistent, operational, and offer more value to the user. GraphGrid Connected Data Platform through the Manager module provides schema and semantic validation and enforcement options. Download today and try our fully featured freemium offering.