DISCLAIMER: I am relatively new to Elasticsearch, so I apologize in case my question is too "basic" or falls into "everybody should know this" category 
Hi! I have a performance question. Let's say, we have this denormalized data in an index:
[
{
"key_id": 1,
"language": "en",
"value": "<some long value here>"
},
{
"key_id": 1,
"language": "fr",
"value": "<some long value here>"
},
{
"key_id": 1,
"language": "de",
"value": "<some long value here>"
},
{
"key_id": 2,
"language": "en",
"value": "<some long value here>"
},
{
"key_id": 2,
"language": "fr",
"value": "<some long value here>"
},
{
"key_id": 2,
"language": "de",
"value": "<some long value here>"
}
]
The goal is to allow the user to search the values in a way a text editor does. This means that wildcard search must be used to allow for partial word matching (please do not focus on the wildcard part
we know it's expensive).
So, each key_id has a set of languages and values for them. The editor displays all languages for each key_id, meaning that if we search the values, we are not interested if all language values for a key_id satisfy the search. So basically a query would be something like this:
{
"collapse":
{
"field": "key_id"
},
"query":
{
"bool":
{
"must":
[
{
"wildcard":
{
"value": "*ello wor*"
}
}
]
}
}
}
As you can see, we only need to know if a given key_id contains what we are searching for, however, it looks like Elasticsearch is performing this wildcard search on each language item of the key_id. So let's say, the wildcard search has matched the result in the "en" value, it will still perform a wildcard search on "fr" and "de" values of the same key_id, which is a bit wasteful if you ask me.
The actual data is a bit more complicated with each "key" potentially having an unlimited number of languages assigned to it as well as the length of the values is potentially unlimited. This means these "extra" searches add up very quickly. Maybe I just don't get it and this is not how it works.
So the question is: Is there a more efficient way to "collapse" the search result per key_id or make Elasticsearch not search values for key_ids that already matched the query?
Thanks in advance!
maybe mention that you want this to happen in the case that we're sorting by doc ID or otherwise ignoring the scores - if you care about scores then there's no way to avoid the extra work.