All data tells a story—a story to be investigated. The beautiful thing about data is its ability to avoid all subtext and ambiguity and go straight to the facts. Swimming through large amounts of it however, can easily become overwhelming and unproductive. In order to be able to critically instigate the data and its story, one must be able to model and query it.
Last year we created a property graph model for a data set released to the public through the CHHS Open Data Portal. What struck us about this data was its enormous potential to be controversial. The data contained all cosmetic products sold in California “known or suspected to cause cancer, birth defects, or other developmental or reproductive harm” (CHHS). Many Californians assume that any makeup product they see on the shelves has passed every safety and health code before it reaches the public. This dataset said otherwise. Using the property graph model, we were able to bring meaning to the numbers.
The Chemicals in Cosmetics data set was richly interconnected. Neo4j brought this to our advantage since we were able to put properties on both the nodes and node relationships. Things like dates and times could be placed on the relationship connecting a Brand to its Product for example. It was much easier to understand the information and what each brand and company was responsible for. The more we queried the data, the richer and more revealing it became. Starting with simple questions such as “Which brand has the most products reported?” helped us funnel the graph down to more specific questions and queries. With Neo4j we were able to seamlessly navigate and scrutinize the data until we had overturned all of its subtleties.
Read the full exploration and detailed writeup, which was the wining Neo4j GraphGist entry for Investigative Journalism.