Most people in an organization have access to far more relevant text documents than they do neatly-organized sets of data. Whether they want to ask questions about customer sentiment or analyze a market’s demand for a potential new product or service, it’s often easier to read pages and pages of text-based research instead of finding or formulating a specific and rigidly structured dataset.
Unfortunately, this method isn’t very time-efficient and only produces results once instead of continuously based on new or changed information. This struggle around getting answers from text documents means you’re probably missing out on significant insights about what your organization does now and what you could do better
Now imagine if you could import this network of flowing, unstructured information into a system that can identify things AND the relationships between them, without having to strip out all the context by rendering it into rigid tables, rows, and columns. Imagine a system that builds knowledge from these relationships, identifying interesting structures or unseen connections automatically and continuously based on new textual information.
That’s the promise of Natural Language Processing (NLP) and graph databases partnering to build knowledge graphs—the unified information from an entire organization, enriched with all the relevant context and semantics you need to turn unstructured data into knowledge.
How do NLP and graph databases work together?
NLP uses tools and techniques to analyze natural language data to understand the meaning behind unstructured tech or speech. The form of NLP you’re most likely used to seeing is speech recognition, or speech-to-text, which our smart devices do when we prompt them with Hey Siri or Hey Google. Behind the scenes, these NLP systems combine rule-based models of human language with machine learning to understand your intent.
NLP isn’t a built-in feature of any graph database. Like their relational counterparts—which only store tables, rows, and columns of structured data—a graph database is a technical means of storing connected data. It doesn’t have an opinion about what information gets stored and generally accepts whatever inputs its maintainers choose. A graph database stores information as nodes and the relationships between those nodes, not the analysis that creates inputs or analysis based on already-stored data.
Graph databases and NLP work together to extract meaning from unstructured text to identify named entities and the relationships between those entities, parse key phrases, score the overall sentiment, and more. Together, they turn unstructured text into a web of nodes, identifying these various meanings and their many relationships.
In other words, graph databases and NLP work together in two ways. First, graph databases can store the result of NLP operations. And second, graph databases enrich NLP operations on queried and returned data.
Either option requires another tool, like a knowledge graph platform, to process and import your previously-untapped unstructured data or automatically surface the relationships between what you already have.
The good news is that these tools already exist and are mature enough for real-world use cases for turning swaths of unstructured, natural language data into networks of closely-related documents, key phrases, sentiments, or concepts. Graph databases act as the foundation for exploratory tools, like knowledge graphs, that let you unlock insights in the relationships between data that your relational databases and structured data can’t possibly surface.
NLP in graph vs. relational databases
Let’s say that you want to import a published article from the web into your database and then process the content using NLP to create a sentiment analysis and summarization. You’re starting with a single article, but your goal is to analyze a corpus to understand how the public thinks about a topic relevant to your business.
If you’re using a relational/SQL database, and this is the first time you’re performing this process, you first need to create a schema: What are you going to store, and how do you want it structured? You’ll likely want relevant metadata for an article, like its title, published date, URL, author, or name. You’ll probably want to also store the article’s full text as a single `TEXT` string, which is a perfectly valid way of doing it—that’s how, for example, WordPress stores every blog post in its associated SQL database.
Once you import the article’s data, you can query the `TEXT` field and feed the results into a third-party NLP tool for processing, which returns the sentiment analysis and summarization you’re seeking. You now need to import the results back into your relational database if you want to compare the results from this article against the others you’ll gather in the future, which means you already need to alter your database’s schema.
As you extend your NLP-based analysis further, you’ll end up in a time-wasting cycle of importing, querying, processing, migrating, and tweaking for every new article and every unique set of results. You’ll inevitably want to run NLP operations against your entire corpus, which means querying and processing in large batches, taking hours, even with heavyweight computing power.
The same process of importing, storing, and processing data with NLP looks very different with a graph database as your foundation. Instead of manual import<->export processes, you leverage your knowledge graph’s NLP service. The NLP service extracts relevant information from the article’s full text using trained NLP models, immediately returning a web of nodes, edges, and properties to your graph database. You get not only the full text of the original article but also annotated text for key points, automatically-discovered “mention” nodes like people or organizations, sentence-level analysis, and sentiment scores.
Graph databases flexibly store these nodes and their relationships without requiring the manual editing of schemas or defining the types of results before they’re finished. This flexibility means you can now use continuous processing to improve insights based on new data or improved training models. It’s an automatic feedback loop between how a graph database accepts inputs and returns data from queries.
How others are using NLP and graph databases today
Extracting data from unstructured text using NLP is just the first step. Once you’ve imported, analyzed, and transformed your unstructured data into nodes, edges, and properties, you open opportunities to process further, deepen relationships with other nodes, and continuously update schemas.
- Textual search helps non-technical users discover the “unknown unknowns” in your unstructured data, even if they’re unsure what they’re searching for, and then see results in context with networked information.
- Similarity scoring compares and contrasts two nodes containing unstructured text to understand if and how they’re related using methods like Term Frequency-Inverse Document Frequency (TF-IDF).
- Summarization automatically parses the most relevant sentences of an article, document, or collection of documents to provide more scannable context for rich information.
- Origination—a form of data lineage—builds and maintains a list or “story” that clearly shows how one document originated from another over time.
- Paraphrase detection compares to documents to determine if they’re related or not and provides easy jumping-off points for exploratory analysis.
- Named Entity Recognition (NER) identifies people, locations, or organizations within any unstructured text without specifying what or where they might be beforehand using a rigid schema design. You can then use relationship extraction to determine how those NERs are related and interconnected.
- Keyphrase extraction> pinpoints key concepts in unstructured text, driving a faster understanding of relationships and context without diving into extensive research.
We see tons of robust, real-world use cases for these specific applications, from mapping customer journeys to delivering more nuanced recommendations to better detecting fraud before it affects the business.
For example, a knowledge graph that utilizes NLP and a native graph database at its foundation can identify previously-unseen relationships between customer identities that would otherwise go unseen if you’re only looking at single rows of information in a relational database. If two customers use the same credit card in quick succession but have no other meaningful relationships, you have a starting point for investigating potential fraud.
What to look for in graph database-powered NLP
If you’re looking to feed unstructured data into a graph database, a few key features will ensure you get the most value from your investment. As a baseline, a graph database and knowledge graph combination should be capable of:
- Connecting unstructured and structured data on a single data platform. Prevent siloing between the data you have previously stored in relational databases and new textual data you want to process using NLP.
- Multilingual NLP operations. You can never be sure when you might need to ingest Spanish, Chinese, or German textual data to stay up-to-date on information from your customers, partners, markets, or even political movements that could be relevant to your operations.
- A knowledge graph built atop a native graph database. It’s the only way to enable not only the individual use cases detailed in the previous section and continuous NLP processing on nodes you already have in your graph database.
- Support for importing multiple formats (LOAD CSV, APOC, XML, JSON, RSS feeds). Ensure that you can ingest all of your data quickly, efficiently, and flexibly enough to keep up with changes in your business, customers, or market realities. Without it, you can leverage the full power of the graph database.
How to quickly deploy a graph database with NLP
Are you eager to explore graph databases from scratch for the first time? Or, do you have tons of data “locked away” in relational databases and want to explore how you could import it into a native graph database? GraphGrid CDP is a flexible, extensible, and developer-focused knowledge graph. It’s based on ONgDB, an open-source, native graph store, and comes with all the NLP capabilities built-in, which means you can jump straight into connecting your natural language data around people, places, and things right away and in real-time.
Ready to jump in? You can download CDP for free. Want a see CDP in action tailed or your industry? Schedule a GraphGrid demo. We’ll show you how NLP and graph databases can integrate with your existing systems to unlock new, unexpected, and potentially business-altering insights.