Search performance loss in Elasticsearch 7

agutierrezrodriguez · September 16, 2020, 5:38pm

Hi!

We are upgrading our Elasticsearch 5.6 clusters to Elasticsearch 7.6.

We're detecting that the search performance is worst in Elasticsearch 7.6 than 5.6.

We are testing empty clusters with the same installation process (ubuntu 18.04 + deb package with systemd) for the two versions of Elasticsearch.

The cluster has 6 data nodes and 1 dedicated master node. (1 cpu, 2 cores and 8 GB ram).

We are populating the clusters with 3000 indices with one shard and at least one replica (about 6500 shards). 22 M of documents and 82 GB of data.

The only modified settings is the mapping that we changed some deprecated tokens ngram instead of nGram, edge_ngram instead of edge_nGram and word_delimiter_graph instead of word_delimiter (due a change in Lucene that break word_delimiter after synonyms filter).

The same data, the same indices, the same search test with the same frequency and we have a 8-10ms of average performance loss in Elasticsearch 7.

After review more the processes we detect that Elasticsearch 7 takes a 1 or 2 ms more time in search phase and the rest in the fetch phase and the cpu usage is more high in Elasticsearch 7.

We are playing with different jvm sizes (2000mb, 2500mb, 3000mb and 3800mb), with different java versions and with different changes over the cluster, with and without ARS, some buffers changes, etc...

We don't know why the same hardware and the same data/settings with the same queries over the clusters have worst performance in Elasticsearch 7.

Could someone help us?

Many thanks!!

Regards. Antonio.

agutierrezrodriguez · September 16, 2020, 5:47pm

I forgot that We tried with Elasticsearch 7.9 too which have a small improvement over Elasticsearch 7.6 performance but far from the Elasticsearch 5.6 overall.

dadoonet · September 16, 2020, 6:29pm

1000 shards per node. With less than 4gb of heap?

I think you should reduce that pressure.

What is the output of:

GET /
GET /_cat/nodes?v
GET /_cat/health?v
GET /_cat/indices?v

If some outputs are too big, please share them on gist.github.com and link them here.

Christian_Dahlqvist · September 16, 2020, 6:53pm

Why do you have such a large number of indices and shards for so little data? How many indices does a typical query target?

agutierrezrodriguez · September 17, 2020, 9:10am

Hi Christian,

Every index is to save data from a catalog.
Every catalog could have different fields and different terms filters, also different number of documents.
Every catalog should be independent from other catalogs due to fields with the same name and different value (mapping).

The queries are launched only to one index in our tests (through an alias).

We have only one shard per index with his replica (to guarantee cluster resiliency). We have only more than 1 replica in indices with a high number of searches to increase throughput.

Our use case is, index all the catalog into an index (in warm index instances) and start to search it (after move to cold instances), only a few indices are indexing some data (few documents) while searching.

Thanks.
Regards. Antonio.

Christian_Dahlqvist · September 17, 2020, 9:38am

What type of queries are you running? One thing that did change between these versions was the removal of the _all field which could affect some types of queries.

agutierrezrodriguez · September 17, 2020, 10:05am

The more frequent are boolean 'and' queries, about 50% with aggs as this example:

{
  "aggs":{
    "categories":{
      "aggs":{
        "selected":{
          "terms":{
            "field":"categories.facet",
            "include":[]
          }
        },
        "terms":{
          "terms":{
            "field":"categories.facet",
            "size":20
          }
        },
        "total":{
          "value_count":{
            "field":"categories.facet"
          }
        }
      },
      "filter":{
        "bool":{
          "must":[]
        }
      }
    },
    "sale_price":{
      "aggs":{
        "range":{
          "aggs":{
            "stats":{
              "stats":{"field":"sale_price"}
            }
          },
          "range":{
            "field":"sale_price",
            "ranges":[{"from": 0}]
          }
        }
      },
      "filter":{
        "bool":{
          "must":[]
        }
      }
    }
  },
  "from":0,
  "highlight":{
    "fields":{
      "description":{
        "fragment_size":100,
        "number_of_fragments":3
      }
    },
    "require_field_match":false
  },
  "post_filter":{
    "bool":{
      "must":[]
    }
  },
  "query":{
    "function_score":{
      "boost_mode":"sum",
      "functions":[],
      "query":{
        "function_score":{
          "functions":[
            {
              "field_value_factor":{
                "field":"manual_boost",
                "missing":1
              }
            },
            {
              "field_value_factor":{
                "field":"auto_boost",
                "missing":1
              }
            }
          ],
          "query":{
            "bool":{
              "filter":{
                "bool":{
                  "must_not":[]
                }
              },
              "minimum_should_match":1,
              "must_not":[],
              "should":[
                {
                  "multi_match":{
                    "cutoff_frequency":0.1,
                    "fields":[
                      "indexed_text^3",
                      "title^3",
                      "title.autocomplete^1",
                      "categories^2",
                      "brand^2",
                      "mpn^2",
                      "gtin^1"
                    ],
                    "operator":"and",
                    "query":"Jawe",
                    "type":"best_fields"
                  }
                }
              ]
            }
          }
        }
      },
      "score_mode":"max"
    }
  },
  "size":10,
  "sort":[
    {
      "_score":"desc"
    }
  ],
  "track_scores":true
}

We don't use the _all field. We disable it in ElasticSearch 5.6 index settings.

Also the query is the same in Elasticsearch 5.6 and Elasticsearch 7

agutierrezrodriguez · September 17, 2020, 2:18pm

Hi David,
Thanks for your help

Yes, We can store up to 1600 shards per node without frequently garbage collector with this configuration.

Take in mind that the configuration is the same in Elasticsearch 5.6 and 7.6 and with every new upgrade of Elasticsearch the heap usage is reduced and should be work better in Elasticsearch 7.6 than 5.6.

Here are the required data:

GET /
{
  "name" : "ip-10-0-12-76",
  "cluster_name" : "earth",
  "cluster_uuid" : "wVpStkDoTbmu-FLfisJGpQ",
  "version" : {
    "number" : "7.6.2",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "ef48eb35cf30adf4db14086e8aabd07ef6fb113f",
    "build_date" : "2020-03-26T06:34:37.794943Z",
    "build_snapshot" : false,
    "lucene_version" : "8.4.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

GET /_cat/nodes?v
ip          heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
10.0.12.7             51          60   0    0.00    0.00     0.00 di        -      ip-10-0-12-7
10.0.12.125           36          98  16    0.16    0.31     0.17 dm        -      ip-10-0-12-125
10.0.12.122           56          98  17    0.50    0.48     0.24 d         -      ip-10-0-12-122
10.0.12.76            57          98  19    0.63    0.64     0.35 d         -      ip-10-0-12-76
10.0.12.180           34          98  14    0.11    0.25     0.14 d         -      ip-10-0-12-180
10.0.12.175           35          98  12    0.23    0.23     0.11 dm        -      ip-10-0-12-175
10.0.12.21            54          98  16    0.51    0.58     0.30 d         -      ip-10-0-12-21
10.0.12.50            22          62   2    0.28    0.12     0.04 dm        *      ip-10-0-12-50

GET /_cat/health?v
epoch      timestamp cluster status node.total node.data shards  pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1600349595 13:33:15  earth   green           8         8   6013 2985    0    0        0             0                  -                100.0%

gist.github.com

https://gist.github.com/agutierrezrodriguez/54b3661dc47ebc4407526f281785e06b

gistfile1.txt

health  status  index                                                             uuid                    pri  rep  docs.count  docs.deleted  store.size  pri.store.size
green   open    d62f0dabbbc0fd14559a7854c3397977_data_ts20200917080835684402      7MS0YphmT_iWm8Duf6IczQ  1    1    1267        0             5.4mb       2.7mb
green   open    4b5807db32ac6b9a77814c583d0e0b05_metadata_ts20200917083304811232  xiQGSnOmTJSWRO4HE4GaxA  1    1    1           0             9kb         4.5kb
green   open    3b4c5140a6f5a0b141fe8198cf1317f1_data_ts20200917082439322002      3h837NpbSDG7smLMjOrAFA  1    1    751         0             3.1mb       1.5mb
green   open    2cbc3b9e64a7becbda2103e38e17c5b6_data_ts20200917091018247218      EuyOchOASCCjknqLfWj_FA  1    1    1141        0             11.7mb      5.8mb
green   open    edf674b2d0003be1cc79dfc8f1b1f377_data_ts20200917081002439342      SLpZ_0YqSoeqhKMlXoHmTg  1    1    38          0             409kb       204.5kb
green   open    5e3cdb8a3a459fb07938d82da67a105c_data_ts20200917085559526908      6on9mhoPReWKJuBRnhnNaA  1    1    2245        0             11.5mb      5.7mb
green   open    d60ab907ce5def96d47c73e6abb93bce_data_ts20200917084442782396      Jp14dSs2S8e_DgZ41rmlHg  1    1    411         0             1.9mb       999.5kb
green   open    eca5ad907e7f33356b0bda7c45b37568_data_ts20200917081536508070      OkYN6LduTZ6e7in7VZ3PBw  1    1    451         0             2.7mb       1.3mb
green   open    a64a2b4f542c51ff56e10a2060601f0c_data_ts20200917090127326873      GzGuiv7eTqalQYKafoSa7w  1    1    26764       0             67.5mb      33.7mb

This file has been truncated. show original

Thanks
Regards. Antonio.

agutierrezrodriguez · September 18, 2020, 7:33am

Chart latency percentile distribution that shows the problem with the same test over the two clusters.

system · October 16, 2020, 7:33am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch Performance Issues Upgrading v6.2.3 to v7.13.1 Elasticsearch	3	656	July 13, 2021
Elasticsearch performance degrades after upgrading from 6.7 to 7.10 Elasticsearch migration	3	586	August 15, 2023
Performance problems when Upgrading from ElasticSearch 1.7.4 to 5.4.0 Elasticsearch	11	2522	August 22, 2017
Performance problems Elasticsearch	12	589	July 6, 2017
Elasticsearch cluster performance Elasticsearch	4	323	November 19, 2021

Search performance loss in Elasticsearch 7

Related topics