A database, especially one full of long strings of unstructured text, is only as valuable as how easily you can search it and extract meaningful interpretations. That’s harder than ever with recent changes in how many organizations ask their teams to manage and learn from the data their functional area produces. Instead of waiting for IT to write queries and code dashboards, they’re asked to take the lead on data analysis.
To give everyone in your organization the power of connected data, and help them build knowledge from unstructured text, you need a cyclical process for importing text data, querying it, and gathering insights. Bonus points if your textual search returns meaningful results even with “imperfect” queries or makes intelligent recommendations on where to look next. And you need that cycle to be as fast and accessible as possible.
Graph databases are an increasingly common answer to this demand, especially when paired with Natural Language Processing (NLP). NLP imports unstructured data for full-text queries and continuously processes it with Named Entity Recognition (NER), sentiment analysis, and more, turning full-text into nodes and edges—information and the relationships between them—for textual search that just makes sense.
First, graph storage
First, let’s reiterate how graph databases store information. Unlike relational databases, which use well-defined schemas of tables, rows, and columns, graph databases store data in nodes, which are connected by edges. Edges are the relationships between nodes and have their own properties, which define the type or category of relationship.
After NLP service imports and processes a long piece of unstructured text, you end up with a single node for the full text and many nodes and edges for all of the ways it was analyzed and broken down into smaller parts.
Indexing and index policy
Searching your graph can be as simple as querying for a specific substring, but most organizations will want to take a different first step: indexing.
- Indexing: The purpose of indexing is to transform graph data into searchable text using a search engine/libraries like Elasticsearch or Apache Lucene<, which are optimized to parse and store data in ways that retrieve information as quickly as possible. An index holds documents, which are searchable units of text data with various properties.
- Searching: Instead of querying the graph data directly, you use the search engine to parse the documents in your index for relevant text, properties, and relationships.
And before you index your graph database for the first time, you need to figure out your index policy, which is like a dictionary for what information gets indexed. Platforms like GraphGrid Connected Data Platform (CDP) do let you create an index policy using all your existing graph data, but you may want to consider narrowing your focus on the most valuable use cases for your organization:
- What do you want to search for?
- What’s your intended use?
- Who is your intended user?
- Where do your search responses go?
For example, the indexing policy for a database of current enterprise customers might be different for a customer success team that’s looking for unhappy users versus a sales team looking for upsell opportunities. The good news is that index policies aren’t set in stone—you can always customize them based on changes to your organization’s goals or talent.
Now you’re not just searching your graph data for string matches—you’re using a full search engine’s capabilities, like filtering, paging, fuzziness, and suggestions, all at the instantaneous speed you’re familiar with when using other graph-based search experiences you’re already familiar with, like Google or Wikipedia.
Why use graph databases for textual search? Context
The power of graph search is in this returned context. That’s how you help business users find exactly the information they need instantaneously and give them recommendations for explorations they would have never thought of otherwise. That’s how you build knowledge about your operations, customers, industry, and beyond.
The relationship between nodes relevant to your search helps you narrow your focus, start a more relevant search, and filter results using other fields within your index. You can even search for things you don’t have a name for and navigate your way through your graph to an answer.
You can now start to imagine a search lifecycle that’s both accessible to non-technical users and powerful for everyone. Start with a search based on natural language, then use the resulting context to inspire unexpected explorations, giving you the power of both rigid queries and natural language input without being limited by either.
Real-world uses for textual search
We’re seeing graph search change the way organizations approach their core functions:
- Data discovery: Textual graph search democratizes discovery. Whether they’re technical users with data science experience or business users like marketing specialists, your data should be accessible to everyone who’s wearing an “analyst” hat for the day. Power users can write complex queries using Elasticsearch’s Query DSL, and everyone else can turn natural language queries into graph data visualizations that are easy to understand and explore—not to mention share with stakeholders.
- Research: Research teams and organizations rely on being able to bridge disparate ideas into novel action. A database technology that treats relationships as first-class is essential to get real-time results for complex queries that would bog down relational databases with slow JOIN operations. Organizations can change their index policies without slowdowns when they need to pivot their dataset or focus.
- Prescriptive recommendation engines: People need recommendations beyond what show to watch or book to buy next. For example, what if a manufacturing facility could get recommendations about their production line based on analysis of the latest details on their supply chain? Continuous indexing of new data and scheduled full-text searches can unlock insights from complex unstructured data that no single person could find at the speed of today’s logistics.
How is textual search different in graph vs. relational databases?
Both graph and relational databases are capable of full-text search (FTS). For example, MySQL uses the `MATCH() AGAINST()` syntax to return potentially relevant documents containing either an exact match to a user’s search or a closely-related variant. This requires a full-text index set up in advance, and even then, MySQL maintains a list of full-text restrictions, particularly on large datasets.
To be fair to relational databases, neither they nor graph databases are optimized for returning natural language search queries on large amounts of text data. The inputs might look simple, but returning relevant results is anything but, which means they can take far longer than most organizations are willing to wait, and could slow down clusters enough to make them useless to concurrent action.
That’s why integrating with a search engine like Elasticsearch makes sense in both cases—you can leverage the speed of a proper index policy and features like sorting and fuzziness.
That said, the core benefit of searching graph data still stands. When you search a relational database, your result is one or more rows—it’s up to you to understand how they might be related to one another and figure out your next steps. Perform the same search on a graph database with the same information, and your result is a network that’s begging to be explored.
How to quickly deploy a graph database for textual search
The fastest way to start searching large volumes of text data with the power of graph search is GraphGrid CDP and GraphGrid Search, which integrate ONgDB with Elasticsearch. With minimal setup, you can get started quickly with plug-and-play indexes, which pre-processes your text data before you start searching to give you instantaneous results.
Business users can save and reuse complex search queries across your organization so that everyone can search graph data using natural language and then narrow, expand, filter, or enhance their expectations—and then their insights—using connected data.
Download GraphGrid CD to get started with context-driven, graph-powered search today.