Problem with "joining" data

I have a scenario where I have a bunch of indices (~100) each with thousands to several millions of documents each.
My problem is each of these documents can be "linked" to one or more containers (Just a grouping name for simplicity). And I need to be able to filter documents in any of these indices based on a selected container.

The obvious answer would be to carry the container name/id on each document , which works perfectly , except they can be linked or unlinked to containers frequently and the required update_by_query takes several hours :frowning:

I cant use parent child links , because parent child can only link in the same index (and can have only one parent and they have to be on the same shard)

I tried indexing a separate document with and array of ids for each doc in the container ...and then using a terms lookup query, which also works great , but only for relatively small amounts of docs per container (I upped the max_terms to 10mil to test :open_mouth: )
I tried the same thing with several smaller arrays and doing a bool-should with all the subsets , but that takes even longer.

Since I use sequential numbering for ids , I tried to save the containers grouping as a set of id ranges , but the groups end up being waaay too scattered for that approach (ending up with 300 000 range queries)

A typical query I run against the indices returns in ~100ms ,but the fastest (with the terms lookup) I've been able to get with the container filter is 10->30s which makes the application unusably unresponsive (I need sub second)

Does anyone have ANY ideas for me ? I'm on ES 7.5 . I can reindex and change mappings and go crazy with the data as long as I can get the performance.


I am not a hundred percent sure, if this fits your use case, but with the release of 7.5, the enrich processor has been released, that allows to enrich documents at index time with information from another index. Maybe this allows you to add some required fields to one of the indices?


Thank for your reply .
The enrich processor looks interesting and I'll definitely keep it in mind going forward , but unfortunately I dont think it'll help my current problem ... If I do go the route of saving the container/group name/id On the documents , then the problem is not with initial indexing , but continuously updating them afterwards to reflect any changes in linking.

I have a workaround for now that would allow me to keep the number of documents per container down to a more manageable amount , allowing me to use the terms lookup method .

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.