Search performance loss in Elasticsearch 7

Hi!

We are upgrading our Elasticsearch 5.6 clusters to Elasticsearch 7.6.

We're detecting that the search performance is worst in Elasticsearch 7.6 than 5.6.

We are testing empty clusters with the same installation process (ubuntu 18.04 + deb package with systemd) for the two versions of Elasticsearch.

The cluster has 6 data nodes and 1 dedicated master node. (1 cpu, 2 cores and 8 GB ram).

We are populating the clusters with 3000 indices with one shard and at least one replica (about 6500 shards). 22 M of documents and 82 GB of data.

The only modified settings is the mapping that we changed some deprecated tokens ngram instead of nGram, edge_ngram instead of edge_nGram and word_delimiter_graph instead of word_delimiter (due a change in Lucene that break word_delimiter after synonyms filter).

The same data, the same indices, the same search test with the same frequency and we have a 8-10ms of average performance loss in Elasticsearch 7.

After review more the processes we detect that Elasticsearch 7 takes a 1 or 2 ms more time in search phase and the rest in the fetch phase and the cpu usage is more high in Elasticsearch 7.

We are playing with different jvm sizes (2000mb, 2500mb, 3000mb and 3800mb), with different java versions and with different changes over the cluster, with and without ARS, some buffers changes, etc...

We don't know why the same hardware and the same data/settings with the same queries over the clusters have worst performance in Elasticsearch 7.

Could someone help us?

Many thanks!!

Regards. Antonio.

I forgot that We tried with Elasticsearch 7.9 too which have a small improvement over Elasticsearch 7.6 performance but far from the Elasticsearch 5.6 overall.

1000 shards per node. With less than 4gb of heap?

I think you should reduce that pressure.

What is the output of:

GET /
GET /_cat/nodes?v
GET /_cat/health?v
GET /_cat/indices?v

If some outputs are too big, please share them on gist.github.com and link them here.

2 Likes

Why do you have such a large number of indices and shards for so little data? How many indices does a typical query target?

1 Like

Hi Christian,

  • Every index is to save data from a catalog.
  • Every catalog could have different fields and different terms filters, also different number of documents.
  • Every catalog should be independent from other catalogs due to fields with the same name and different value (mapping).

The queries are launched only to one index in our tests (through an alias).

We have only one shard per index with his replica (to guarantee cluster resiliency). We have only more than 1 replica in indices with a high number of searches to increase throughput.

Our use case is, index all the catalog into an index (in warm index instances) and start to search it (after move to cold instances), only a few indices are indexing some data (few documents) while searching.

Thanks.
Regards. Antonio.

What type of queries are you running? One thing that did change between these versions was the removal of the _all field which could affect some types of queries.

The more frequent are boolean 'and' queries, about 50% with aggs as this example:

{
  "aggs":{
    "categories":{
      "aggs":{
        "selected":{
          "terms":{
            "field":"categories.facet",
            "include":[]
          }
        },
        "terms":{
          "terms":{
            "field":"categories.facet",
            "size":20
          }
        },
        "total":{
          "value_count":{
            "field":"categories.facet"
          }
        }
      },
      "filter":{
        "bool":{
          "must":[]
        }
      }
    },
    "sale_price":{
      "aggs":{
        "range":{
          "aggs":{
            "stats":{
              "stats":{"field":"sale_price"}
            }
          },
          "range":{
            "field":"sale_price",
            "ranges":[{"from": 0}]
          }
        }
      },
      "filter":{
        "bool":{
          "must":[]
        }
      }
    }
  },
  "from":0,
  "highlight":{
    "fields":{
      "description":{
        "fragment_size":100,
        "number_of_fragments":3
      }
    },
    "require_field_match":false
  },
  "post_filter":{
    "bool":{
      "must":[]
    }
  },
  "query":{
    "function_score":{
      "boost_mode":"sum",
      "functions":[],
      "query":{
        "function_score":{
          "functions":[
            {
              "field_value_factor":{
                "field":"manual_boost",
                "missing":1
              }
            },
            {
              "field_value_factor":{
                "field":"auto_boost",
                "missing":1
              }
            }
          ],
          "query":{
            "bool":{
              "filter":{
                "bool":{
                  "must_not":[]
                }
              },
              "minimum_should_match":1,
              "must_not":[],
              "should":[
                {
                  "multi_match":{
                    "cutoff_frequency":0.1,
                    "fields":[
                      "indexed_text^3",
                      "title^3",
                      "title.autocomplete^1",
                      "categories^2",
                      "brand^2",
                      "mpn^2",
                      "gtin^1"
                    ],
                    "operator":"and",
                    "query":"Jawe",
                    "type":"best_fields"
                  }
                }
              ]
            }
          }
        }
      },
      "score_mode":"max"
    }
  },
  "size":10,
  "sort":[
    {
      "_score":"desc"
    }
  ],
  "track_scores":true
}

We don't use the _all field. We disable it in ElasticSearch 5.6 index settings.

Also the query is the same in Elasticsearch 5.6 and Elasticsearch 7

Hi David,
Thanks for your help

Yes, We can store up to 1600 shards per node without frequently garbage collector with this configuration.

Take in mind that the configuration is the same in Elasticsearch 5.6 and 7.6 and with every new upgrade of Elasticsearch the heap usage is reduced and should be work better in Elasticsearch 7.6 than 5.6.

Here are the required data:

GET /
{
  "name" : "ip-10-0-12-76",
  "cluster_name" : "earth",
  "cluster_uuid" : "wVpStkDoTbmu-FLfisJGpQ",
  "version" : {
    "number" : "7.6.2",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "ef48eb35cf30adf4db14086e8aabd07ef6fb113f",
    "build_date" : "2020-03-26T06:34:37.794943Z",
    "build_snapshot" : false,
    "lucene_version" : "8.4.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}
GET /_cat/nodes?v
ip          heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
10.0.12.7             51          60   0    0.00    0.00     0.00 di        -      ip-10-0-12-7
10.0.12.125           36          98  16    0.16    0.31     0.17 dm        -      ip-10-0-12-125
10.0.12.122           56          98  17    0.50    0.48     0.24 d         -      ip-10-0-12-122
10.0.12.76            57          98  19    0.63    0.64     0.35 d         -      ip-10-0-12-76
10.0.12.180           34          98  14    0.11    0.25     0.14 d         -      ip-10-0-12-180
10.0.12.175           35          98  12    0.23    0.23     0.11 dm        -      ip-10-0-12-175
10.0.12.21            54          98  16    0.51    0.58     0.30 d         -      ip-10-0-12-21
10.0.12.50            22          62   2    0.28    0.12     0.04 dm        *      ip-10-0-12-50
GET /_cat/health?v
epoch      timestamp cluster status node.total node.data shards  pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1600349595 13:33:15  earth   green           8         8   6013 2985    0    0        0             0                  -                100.0%

Thanks
Regards. Antonio.

Chart latency percentile distribution that shows the problem with the same test over the two clusters.