Data migration is one of the necessary evils involved with keeping a database aligned with the evolving needs of the business and applications using it. With the increasing demand for enterprises of all sizes to iterate more quickly and drive change from within the data migration conversation becomes much more frequent. Data migration procedures are something that can take a very long time or not even be feasible depending on the size and structure of the data in a database.
Data migration is not just an enterprise issue. Startups are changing at even more rapid rate while they iterate on their product(s) and business model(s) trying to figure out exactly what they need to be. Being able to perform rapid, low risk data migrations with minimal impact to existing applications using the database is one great benefit of Neo4j with it’s flexible schema-free data model.
- Neo4j treats relationships as primary entities within the database which means you can add a new relationship to connect certain nodes in a new way without needing to migrate a table schema enforcing a new foreign key along with inserting all the corresponding references into each row in the table or building a JOIN-table.
- Neo4j uses labels to index common nodes. A label is like a tag and node can have any number labels. Labels are useful in a data migration because while they associate nodes together under a certain type they don’t bring with them, by default, a schema definition containing properties, data types and the like that must be adhered to by any node given that label. This means you can temporarily group a set of nodes with a certain label prior to being migrated and once each node is migrated move it to the new label or remove the label altogether to indicate it has been migrated successfully.
One of the most common data migrations that occurs within the Neo4j graph database is changing a concept from being stored as a property or an array value to a node with relationships. In Neo4j Nodes can be inserted to move a concept that was originally a property to be a primary entity in the graph to which other nodes can be connected. This can be done using MERGE to synthesize all the occurrences of a property’s unique value into a single node to represent that thing. Then all the nodes related to that property can be connected with their contextually relevant relationship.
Let’s assume we want to see all the actors that ever had the role of Batman. It would be cumbersome to have to go through all actors who appeared in any batman movie over the years and check the roles array on the ACTED_IN relationship.
What we want is to perform the Neo4j data migration in a couple steps. First, we’d have to go through all the Batman films and extract the values in the roles property on the ACTED_IN relationship to its very own Character node with the name of the role value. We then connect the Actor and the Movie to that Character node.
Any existing application code could continue utilizing the roles array. After the application code was updated to utilize the new structure, the roles property could be removed from the ACTED_IN relationship.
Now, we could easily use the new graph data model to answer our question about everyone that has ever played the role of Batman.
This integration challenge for large graphs is something we’ve been managing in production for the last few years and we’ve made this data tooling available to those deploying and managing Neo4j within the GraphGrid Data Platform. Email us to learn more.