There are a number of ways of integrating Neo4j with ElasticSearch. One common way was through the use of the Rivers plugin, but that was deprecated in ElasticSearch 1.5 and will likely be fully removed shortly after ElasticSearch 2.0. Going forward any integration will require a more sophisticated integration to index the desired nodes and relationships from Neo4j to ElasticSearch.
For those that don’t know, ElasticSearch is an open source search server based on Lucene that provides a distributed full-text search engine that utilizes JSON documents with a RESTful API.
- Swift search against large data volumes
Large and complex graph traversal queries spanning tens to hundreds of thousands of nodes that would take many seconds will take milliseconds with ElasticSearch because the query result is stored in a single document that can be easily indexed. The design of ElasticSearch is leaner and lot simpler compared to a database consisting of columns, rows, tables, fields, and schemas, which enables many documents with concise results to be indexed in a caching mechanism when the attribute nature of the query variations doesn’t explode the combinations needing stored.
- Document indexing to repository
ElasticSearch can easily convert raw data (message files or log files) into internal documents. It then stores them within a basic data structure. Flowing documents to ElasticSearch is reliable to automate in a push fashion from Neo4j.
- Quick data access via de-normalized storage
ElasticSearch will usually house a document for every repository in which it lives in. Full text searches are swift since documents are housed nearby to corresponding metadata within the index. The aggregators and language analyzers can then be used effectively to build together search queries that go from text entry to a starting set of nodes for the Neo4j graph query to complete the process of returning a result.
- Scalable and distributable
ElasticSearch is capable of scaling thousands of servers while accommodating petabytes of data. Its capacity results directly from its highly distributed and intricate architecture. This scalability is a great front for the query result documents to lower the complex and potentially long running query load off Neo4j.
Surfacing relevant search results isn’t easy, but when leveraging the connectedness of your data through Neo4j along with the aggregation and language capabilities provided by ElasticSearch a powerful pairing emerges, which many refer to as “graph-aided search”. At GraphGrid, we see it as using each technology for what it’s best and have been enabling businesses to leverage this auto-indexing connection since ElasticSearch 0.90 and Neo4j 1.9. This deep expertise is built as a core integration within the GraphGrid Data Platform for all to use as the ElasticGraph service.