We're using elasticsearch for a graph-like workload, and it works excellent – we benchmarked it against 5 major alternatives on our workload on all queries except one. We need a very limited form of join, which we currently do on the app side. We basically have index consisting of docs like these:
A_1
A_2
..
A_n
B_1
B_2
...
B_n
...
We need to find all such letters where A_x satisfies X and A_y satisfies Y with x < y and X and Y are some terms or prefix queries. Doing this on application side takes tens of second with elasticsearch, vs. sub-second in a DB with joins (but ES is much faster on all other queries we have).
Are there any better ways to run our query faster, preferably per-shard in Elasticsearch?
Things we considered:
-
We could have A_1...A_n as nested or child documents of A. We can find A's that match X and Y, but can't enforce x < y.
-
We can route documents based on letter and sort the index based on letter and number, then have aggregation that starts with A_x and goes through remaining A's looking for a document satisfying Y, say using aggregation script. Unfortunately there is no way to specify a query to match "all A_x matching X and subsequent documents until B_1".
-
This is a substantially big problem for us to invest time into writing a Java plugin for Elasticsearch. We're not sure whether the type of things we need to do are possible though plugins, or where to start with it.
Any ideas on solving this problem would be very welcome!