Query execution drastically increased after upgrade elastic cloud hosted deployment from v8.18.4 to v9.0.4

We had a really long query (it could be up to 30k lines), and in v8 it took around 50 ms. After upgrading to v9, the query took around 400–500 ms.

When we profiled it, the results were essentially identical in profile.shards[0].searches. (profiled v8 and v9)

We reindexed everything, checked the mappings, templates, and cluster configuration, but we didn’t find any differences.

Then we refactored the query, which reduced the query size by about 75% (from 30k to 7k lines). The query performance improved by roughly 15%, but it’s still far from the performance we had in v8.

The original query, but simplified:

GET some-index/_search
{ "collapse": { "field": "relationships.master.data.id.raw" }, "from": 0, "profile": true, "size": 100, "query": { "bool": { "must": [ { "bool": { "filter": { "bool": { "must": [ { "term": { "id": "some-id" } } ] } } } }, { "bool": { "should": [ { "bool": { "filter": [ { "term": { "meta.locale": "ar-INT" } } ] } }, { "bool": { "filter": [ { "term": { "meta.locale": "az-AZ" } } ], "must_not": [ { "term": { "translations": "ar-INT" } } ] } }, { "bool": { "filter": [ { "term": { "meta.locale": "az-INT" } } ], "must_not": [ { "term": { "translations": "ar-INT" } }, { "term": { "translations": "az-AZ" } } ] } }, { "bool": { "filter": [ { "term": { "meta.locale": "bs-BA" } } ], "must_not": [ { "term": { "translations": "ar-INT" } }, { "term": { "translations": "az-AZ" } }, { "term": { "translations": "az-INT" } } ] } } ] } } ] } }, "_source": false }

The optimized query, but simplified:

GET some-index/_search
{ "collapse": { "field": "relationships.master.data.id.raw" }, "from": 0, "profile": true, "size": 100, "query": { "bool": { "must": [ { "bool": { "filter": { "bool": { "must": [ { "term": { "id": "some-id" } } ] } } } }, { "bool": { "should": [ { "bool": { "filter": [ { "term": { "meta.locale": "ar-INT" } } ] } }, { "bool": { "filter": [ { "term": { "meta.locale": "az-AZ" } } ], "must_not": [ { "terms": { "translations": [ "ar-INT" ] } } ] } }, { "bool": { "filter": [ { "term": { "meta.locale": "az-INT" } } ], "must_not": [ { "terms": { "translations": [ "ar-INT", "az-AZ" ] } } ] } }, { "bool": { "filter": [ { "term": { "meta.locale": "bs-BA" } } ], "must_not": [ { "terms": { "translations": [ "ar-INT", "az-AZ", "az-INT" ] } } ] } } ] } } ] } }, "_source": false }

However, after the optimization, we’re seeing different profile metrics. The profile.shards[0].searches[0].query still has identically matching entries, but a lot of duplication appears.

In the optimized query, I’ve observed the following duplication behavior:

In v8, the profile.shards[0].searches[0].query array contains 2 elements. The first element is a BooleanQuery, and the second is a TermQuery with the description translations:az-AZ.

In v9, the profile.shards[0].searches[0].query array contains 26 elements. The first two queries are exactly the same (BooleanQuery and TermQuery), but the next 24 elements are duplicates of the TermQuery with the description translations:az-AZ.

As the query grows, these duplicates appear more and more often in profile.shards[0].searches[0].query.

Is this a bug?
Could this be causing the performance difference between v8 and v9?
Where else could the performance bottleneck originate?
And most importantly: how can we fix it?

If you run the query without collapse part, does it still have degraded performance?

Would it be possible for you to share full profiler output?
It is difficult to judge based only this information.


Profile output itself and repeated output in it should not be a cause of degraded performance of the query with profile disabled

Hi Mayya,
If we remove the collapse, it does not have any influence to the performance.

Since the unoptimized query had exactly the same profile, I will not attach it.

But I can attach the optimized query profiles. Also from v8 and v9. (Removed some sensitive informations, but nothing related to the execution)

Here is the v8 profile results: profile-v8
Here is the v9 profile results: profile-v9

Also there is a huge gap in the lines of profile.
v8: 12k
v9 50k (The same query can have different line of profile results. Some of them 50k, some of them 60k or even 70k)

I concede this is not what you asked, but can you describe, in normal English words, what this query (the one that is profiled in attached files profiled) actually does?

I ask because:

```

+(ConstantScore(id:f4c9c13b-9d9a-4ec1-8074-bb26cacbcc38))^0.0
+(meta.locale:ar-LB
   (+meta.locale:ar-MA -translations:(ar-LB))
   (+meta.locale:ar-MEA -translations:(ar-LB ar-MA))
   (+meta.locale:ar-OM -translations:(ar-LB ar-MA ar-MEA))
   (+meta.locale:ar-QA -translations:(ar-LB ar-MA ar-MEA ar-OM))
   (+meta.locale:ar-SA -translations:(ar-LB ar-MA ar-MEA ar-OM ar-QA))
   (+meta.locale:ar-TN -translations:(ar-LB ar-MA ar-MEA ar-OM ar-QA ar-SA))
   (+meta.locale:az-AZ -translations:(ar-LB ar-MA ar-MEA ar-OM ar-QA ar-SA ar-TN))
   ...
   ever expanding lists, +1 each time, of +meta.locale and -translations, lists expanding in alphabetical order
   ...
   +(meta.published:T meta.published:F)"

It looks a weird construction to me, and sort of thing that could possibly be done with simpler query / an aggregation.

But, FWIW, this seems a bug to me, as the query is simply evaluated significantly differently


$ cat profile-v8.txt | jq -Scr '.profile.shards[].searches[].query[]' | wc
     11   24195  384902
$ cat profile-v9.txt | jq -Scr '.profile.shards[].searches[].query[]' | wc
    159   25452 1100521
$ sdiff --expand-tabs -w 130 <(jq . profile-v8.txt| gron | egrep -o '.type = .*;$' | sort | uniq -c | sort -k4) <(jq . profile-v9.txt| gron | egrep -o '.type = .*;$' | sort | uniq -c | sort -k4)
    122 .type = "BooleanQuery";                                 |      262 .type = "BooleanQuery";
      1 .type = "BoostQuery";                                            1 .type = "BoostQuery";
      1 .type = "fetch";                                                 1 .type = "fetch";
      1 .type = "FetchDocValuesPhase";                                   1 .type = "FetchDocValuesPhase";
      1 .type = "FetchFieldsPhase";                                      1 .type = "FetchFieldsPhase";
    109 .type = "MultiTermQueryConstantScoreBlendedWrapper";           109 .type = "MultiTermQueryConstantScoreBlendedWrapper";
      1 .type = "StoredFieldsPhase";                                     1 .type = "StoredFieldsPhase";
    214 .type = "TermQuery";                                    |     1471 .type = "TermQuery";

you will see a whole bunch more BooleanQuery and TermQuery in v9 than in v8.

1, It also seems like a bug to us — not just because of the different appearance between BooleanQuery and TermQuery, but also because, for the same query executed multiple times (in v9), we received different lines in the profiling results.

Have tested with the latest 9.2.0, and still the same.

The question is whats next? (from our side / from elastic side)
Should we report it somewhere? (GH?)

2, Well, I don't know about the exact business use case, or idea.
I am also just see the data structure and logical steps:

const example = {
  // ...
  id: 'uuid',
  meta: { locale: 'de-INT' },
  translations: ['de-AT', 'de-DE', 'de-INT']
  // ...
}

And from this list: ar-LB,ar-MA,ar-MEA,ar-OM,ar-QA,ar-SA,ar-TN,az-AZ,..., we have to follow the logic above.

1 Like

I leave others, maybe from Elastic, to answer on first point.

Again, I acknowledge this is not at all what you asked !!

Obviously it depends on your documents.

Is the queries specific list of locales/translations the complete set of all available values?

Do you have documents with that have non-matching locale and translations for each id, maybe even null for translation when locale has a value?

Because, just starting from the top:

So we matched all docs matching this locale, ar-LB, irrespective of translations having a value or what that value is.

Now we added all matches of locale ar-MA, excluding those with translations ar-LB. Does that exclusion actually exclude any docs, i.e. would (+meta.locale:ar-MA +translations:ar-LB) have matched any docs?

Now we added all matches of ar-MEA, excluding those with translations ar-LB or ar-MA. Does that actually exclude any docs? Would (+meta.locale:ar-MEA +translations:(ar-LB ar-MA))have exluded any docs?

And so on.

The locale are listed in alphabetical order, which is suggestive. if the locale’s were listed in some other order, say a completely random order, would the query always respond exactly the same? Because the alphabetical order is imposing an implicit priority on the locales.

My hunch is that if yes, and it would not matter which order the locales/translations were given the results would be the same anyways, then the query is likely too complex for what it is actually doing.

If no, different order for the locales/translations could give different result sets, then I’d make a guess that there is possibly a subtle logic bug hidden here.

I accept this is wild speculation. But you also mentioned:

There a lot hidden behind “we refactored the query”. Though very long your (shared in the profile) query is not particularly complex, in the logical sense. And speaking very generally, it’s very challenging to correctly re-factor things that are not well understood. Or rather the converse, for code that is not well understood, it is really easy to refactor to a logically different construction.

EDIT: Little thought experiment - If somehow some random + was changed to a - in your long query, or vice versa, effectively changing (breaking?) the logic, would anyone notice? How would anyone notice?

This isn’t what your shared (optimized) query is doing.

Here’s the bits that have a +meta.locale: de-, i.e. the German locales:

"description": "+meta.locale:de-INT -translations:(ar-LB ar-MA ar-MEA ar-OM ar-QA ar-SA ar-TN az-AZ bg-BG bs-BA cs-CZ da-DK)",
"description": "+meta.locale:de-AT -translations:(ar-LB ar-MA ar-MEA ar-OM ar-QA ar-SA ar-TN az-AZ bg-BG bs-BA cs-CZ da-DK de-INT)",
"description": "+meta.locale:de-CH -translations:(ar-LB ar-MA ar-MEA ar-OM ar-QA ar-SA ar-TN az-AZ bg-BG bs-BA cs-CZ da-DK de-AT de-INT)",
"description": "+meta.locale:de-DE -translations:(ar-LB ar-MA ar-MEA ar-OM ar-QA ar-SA ar-TN az-AZ bg-BG bs-BA cs-CZ da-DK de-AT de-CH de-INT)",

There’s not obvious rationale for why you might have docs with meta.locale:de-INT and some non-German translations like ar-LB, but ONLY for specific values for translations that happen to appear before de-INT alphabetically. So docs with meta.locale:de-INTand translations: el-CYwould actually match but docs with meta.locale:de-INTand translations: ar-LB would not match. I consider that unlikely to be what is intended.

I would not worry about profile output and over-analyze it.

Thanks for confirming that you still see degradation on 9.2.

Please create a github issue that outlines regression ( we would need specific version of 8.x where it worked well) and specific version of 9.x where it regressed.

Also would be valuable to provide minimum full query where the regression manifested:

Do you see regressed performance on this query:

{
    "bool": {
        "filter": [
            {
                "term": {
                    "meta.locale": "az-AZ"
                }
            }
        ],
        "must_not": [
            {
                "terms": {
                    "translations": [
                        "ar-LB",
                        "ar-MA",
                        "ar-MEA",
                        "ar-OM",
                        "ar-QA",
                        "ar-SA",
                        "ar-TN",
                        "az-AZ",
                        "bg-BG",
                        "bs-BA",
                        "cs-CZ",
                        "da-DK",
                        "de-AT",
                        "de-CH",
                        "de-DE",
                        "de-INT",
                        "el-CY",
                        "el-GR",
                        "en-AU",
                        "en-CA",
                        "en-CAR",
                        "en-EG",
                        "en-EU",
                        "en-GB",
                        "en-HK",
                        "en-IE",
                        "en-IN",
                        "en-INT",
                        "en-JM",
                        "en-KE",
                        "en-KW",
                        "en-LK",
                        "en-MEA",
                        "en-MK",
                        "en-MT",
                        "en-MV",
                        "en-MY",
                        "en-NA",
                        "en-NG",
                        "en-NZ",
                        "en-OM",
                        "en-PH",
                        "en-PK",
                        "en-SE",
                        "en-SG",
                        "en-US",
                        "en-ZA",
                        "es-AR",
                        "es-BO",
                        "es-CAR",
                        "es-CL",
                        "es-CO",
                        "es-CR",
                        "es-DO",
                        "es-EC",
                        "es-ES",
                        "es-INT",
                        "es-LAT",
                        "es-MX",
                        "es-PA",
                        "es-PE",
                        "et-EE",
                        "fi-FI",
                        "fr-BE",
                        "fr-CA",
                        "fr-CH",
                        "fr-DZ",
                        "fr-FR",
                        "fr-INT",
                        "fr-MU",
                        "fr-SN",
                        "he-IL",
                        "hi-IN",
                        "hr-HR",
                        "hu-HU",
                        "id-ID",
                        "is-IS",
                        "it-IT",
                        "ja-JP",
                        "ka-GE",
                        "kk-KZ",
                        "ko-KR",
                        "lt-LT",
                        "lv-LV",
                        "mk-MK",
                        "nl-BE",
                        "nl-NL",
                        "no-NO",
                        "pl-PL",
                        "pt-AO",
                        "pt-BR",
                        "pt-PT",
                        "ro-RO",
                        "ru-BY",
                        "ru-KZ",
                        "ru-RU",
                        "ru-UZ",
                        "sk-SK",
                        "sl-SI",
                        "sq-AL",
                        "sq-KV",
                        "sr-ME",
                        "sr-RS",
                        "sv-SE",
                        "th-TH",
                        "tr-TR",
                        "uk-UA",
                        "zh-CN",
                        "zh-HK"
                    ]
                }
            }
        ]
    }
}

So I have checked where it started to regress, and come out that, when I add the id into the filter the performance started to diverge between v8 and v9.
Seems like the id filter weighted different in that two versions. (github issue)

1 Like