This site is not optimized for Internet Explorer 9 and lower. Please choose another browser or upgrade your existing browser in order get the best experience of this website.

Why Robust Change Data Capture Is Essential for Knowledge Graphs

Learn why powerful change data capture (CDC) tooling is a must-have for enterprise knowledge graphs

Your enterprise knowledge graph is your view of the world. It supplies all of the who, what, when, where, and why details that your organization needs for day-to-day operations and decision-making.

Equally, if not more, important is that your knowledge graph is the world for your artificial intelligence. Your AI solution only knows and understands the data in your knowledge graph. When that data changes, it’s imperative that those deltas be communicated to your AI as well. After all, your knowledge graph is the basis for training (or re-training) your AI, and if your training data isn’t up to date, then neither is your AI – or its decisions.

The same could be said for your search engine: To ensure that time-sensitive search queries are answered with the freshest information, you need search data and indexes that are updated whenever new data is added to your database or whenever existing data changes.

In order to avoid using stale data or out-of-date intel, you need a robust change data capture process for your knowledge graph (and its underlying graph database).

What Is Change Data Capture (CDC)?

Change data capture (CDC) is an approach to tracking whenever data is modified or updated in your database and then communicating those changes throughout the rest of your enterprise architecture – including your knowledge graph and other related modules. In short, whenever data changes in your graph, CDC broadcasts it throughout your knowledge graph in an organized manner.

Change Data Capture infographicRobust CDC tooling integrates with your knowledge graph to keep the rest of your systems aware of the freshest data. As a result, it helps update your search engine indexes and enrich your machine learning training models, which then re-train your AI. In a changing world, change data capture is a critical consideration for every backend development team.

Introducing GraphGrid Fuze Distributor

GraphGrid Fuze helps handle the CDC process for users of GraphGrid. Fuze provides integration services to distribute, route, and transform transactional event data from ONgDB to trigger dynamic workers for graph processing, searching, indexing, and machine learning processes. GraphGrid Fuze has five components: Distributor, ONgDB Writer/Reader, Trigger Manager, and Worker.

Today, we’ll be taking a closer look at Fuze Distributor for change data capture. (We’ll discuss other components of Fuze in future posts.)

The Distributor component of Fuze is a single listening plugin that forwards its received messages to a number of other broker endpoints. Fuze Distributor listens for incoming messages in the form of transaction data sent via a trigger after changes are made to the graph database. In other words, a Distributor picks up any changes to the graph and sends that information to other message brokers which then update the appropriate GraphGrid modules.

Setting up a Fuze Distributor is particularly important if you’re indexing text data that’s being processed by Natural Language Processing (NLP). Since NLP data extraction takes a variable amount of time to complete, a Distributor ensures that the Search index policy is made aware of any changes to the graph.

The Fuze Distributor currently works with these message brokers:

  • Apache Kafka
  • RabbitMQ
  • Amazon SQS

The Distributor simplifies the CDC process when working with multiple message brokers by providing a single listening endpoint that forwards its received messages to any number of other broker endpoints. Fuze Distributor uses Geequel to filter which messages are forwarded, so any data transformation that’s possible with Geequel can be performed in Fuze as well.

For more information, check out the GraphGrid Fuze Distributor tutorial here.

Conclusion

Robust CDC tooling – such as GraphGrid Fuze Distributor – ensures your knowledge graph has a system-wide awareness of updates and modifications to your data.

Of course, CDC is just one part of the larger Fuze data integration services that distribute, route, and transform transactional event data to trigger dynamic searching, indexing, and machine learning processes. These event-driven capabilities enable your graph data to react and respond to changes in real-time.

So when your data changes, your decisions keep pace.

Try out Fuze Distributor:
Download GraphGrid today.