Joins within plugin or script

mastayoda · May 4, 2017, 7:06pm

Hello all,

For our project(trueno) which is a Graph Database, we need to do Joins in order to traverse the graph. We are using Elasticsearch as our distributed storage engine.

Basically, we have done many tweaks to get pretty good performance, such as TransportClient instead of rest, and pure WebSockets to extract and index Vertices and Edges from the storage.

We have been evaluating how to better do traversals. We where using the SIREn Join plugin, but it is pretty slow and can only do 2 step joins. We are very aware that Join operations in a distributed system are very expensive but we need to do the traversal anyways. Our options seems to be the following:

Application Side Joins(too many round trips, therefore slow)
Use the parent or nested features to store the vertices and edges(If graph is fully connected, it wont be scalable since all documents needs to reside in the same shard).

Is there a way to use the Script or Plugin interfaces to do something like the following?:

1- The user define the graph traversal query.
2- Sends the query to a plugin or script.
3- The plugin or script optimizes the traversal.
4- Within the plugin or script, the search is performed internally by steps, example:
vertex(1)->neighbors()->neighbors(where a>5)

Explanation: Start from vertex with id 1. Get it's neighbors, then get those neighbor's neighbors where property a > 5

5- Return the whole set of documents resulting from the traversal to the client.

The aims from this are:

1- Minimize roundtrips.
2- Optimize search within ElasticSearch and not from the client side.

We have the following structure in the storage.

Every graph is an Index
Every index has two types (vertices and edges)

Any suggestion on how to approach this in a performant way? Also, is possible to invoke an plugin endpoint from the TransportClient?

Thanks in advanced for your help, we really appreciate it.

PD: We have been all over the latest documentation but we where unable to find answers to these questions.

nik9000 · May 4, 2017, 7:24pm

We've been slowly replacing TransportClient because it causes a ton of coupling with the internals of Elasticsearch. We've been replacing it with a REST based client.

Scripts have to be synchronous and network communication for the join wouldn't be. Plugins can do what they like, though the Lucene API for queries is synchronous so you wouldn't want to use that without some major reworking.

mastayoda · May 4, 2017, 7:31pm

I understand... Why not provide a webSocket(RFC 6455) also? REST calls are extremely slow and have a lot of overhead. WebSockets are available for pretty much every language and platform. We started with REST, but moving to TransportClient got us a 10X speedup.

I see. Technically we can use threads within the plugin to access the index in parallel, right? Is not there an interface which lets Elasticsearch do the work? Such an internal interface which accepts queries. I'm afraid we would need to get into shard routings etc.

system · June 1, 2017, 7:44pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch support join on tables or not? Elasticsearch	11	3313	February 17, 2018
Nested/join scripted queries Elasticsearch	5	975	July 5, 2017
Joins using internal queries Elasticsearch	2	244	July 6, 2017
Filter Elasticsearch Plugin Logstash	1	248	June 23, 2020
To Join or Not to Join Elasticsearch	3	650	August 11, 2019

Joins within plugin or script

Related topics