For our project(trueno) which is a Graph Database, we need to do Joins in order to traverse the graph. We are using Elasticsearch as our distributed storage engine.
Basically, we have done many tweaks to get pretty good performance, such as TransportClient instead of rest, and pure WebSockets to extract and index Vertices and Edges from the storage.
We have been evaluating how to better do traversals. We where using the SIREn Join plugin, but it is pretty slow and can only do 2 step joins. We are very aware that Join operations in a distributed system are very expensive but we need to do the traversal anyways. Our options seems to be the following:
- Application Side Joins(too many round trips, therefore slow)
- Use the parent or nested features to store the vertices and edges(If graph is fully connected, it wont be scalable since all documents needs to reside in the same shard).
Is there a way to use the Script or Plugin interfaces to do something like the following?:
1- The user define the graph traversal query.
2- Sends the query to a plugin or script.
3- The plugin or script optimizes the traversal.
4- Within the plugin or script, the search is performed internally by steps, example:
Explanation: Start from vertex with id 1. Get it's neighbors, then get those neighbor's neighbors where property a > 5
5- Return the whole set of documents resulting from the traversal to the client.
The aims from this are:
1- Minimize roundtrips.
2- Optimize search within ElasticSearch and not from the client side.
We have the following structure in the storage.
- Every graph is an Index
- Every index has two types (vertices and edges)
Any suggestion on how to approach this in a performant way? Also, is possible to invoke an plugin endpoint from the TransportClient?
Thanks in advanced for your help, we really appreciate it.
PD: We have been all over the latest documentation but we where unable to find answers to these questions.