Unexplainable slow queries on Elasticsearch 😔

silex-kbui · August 25, 2021, 6:07pm

Hi community,
I’m asking for help today because we’re experiencing huge latencies on some Elasticsearch queries, and we’re in a lack of ideas to explore to find the actual source of the trouble.

We are not looking for immediate solutions (even if we can dream), but clues we can explore to understand the issue and resolve it.

Issue:

When, most of the time, our search queries give results in less than a second, some of our queries can take more than 5 seconds, more than 10 seconds, more than 20 seconds without any particular reason.
The query is always the same, only the keywords change, and we did not identify specific keywords that could explain such huge responses times.

And here’s the most blocking thing, we are not able to reproduce the issue easily, because we are unable to identify scenarios invoking those latencies.

Search context:

We have a Spring Boot application that executes our search queries using Elasticsearch Java Client.
We use only one query (that you can find at the end of the post), and for now, the query is run only a few times per hour. So, this is definitely not load issues.

We also did manual testing using Kibana, and observed a similar behaviour.

Indexation context:

We have a partial indexation (~5% documents) once a day, then we have a full indexation once a week. It is mostly document updates, but we can also have a few creations/deletions.
Our indexations are made during not-business hours, to be sure that index requests do not impact search requests performance.

Benchmark context:

We benchmark with Locust.
We use a dictionary of ~2000 queries, built from actual queries run by our clients on the main application.
And here’s the most weird thing. In production, or when we do manual tests, we run a few queries, and we sometimes have results in 1~20 seconds..
But when we use benchmarking tools, and we stress Elasticsearch up to 50 queries per seconds, Elasticsearch responds in less than 300ms.

What we already tried, without success:

force_merge index, in case it was fragmentation issue
let Elastic generate its own document IDs (because we override document IDs as indexing phase), in case it was “advance” phase latency issue
preload data (index.store.preload: ["nvd", "dvd"]), in case it was warm-up issue
increase index.search.idle.after setting to 1 hour
tried to simplify our hosting setup (stop using docker, stop using Openstack volumes, and run directly on our Openstack VMs with local hard disks)

What we already tried, that seems to help:

we have a script that run random search query every minute (using benchmark data), to keep Elastic “warm” → it seems that keeping Elasticsearch warm reduce latencies around 2~5 seconds maximum

Specs:

This is the first time I post in this community, I’ll try to be as exhaustive as possible, don’t hesitate to point out missing details.

Version: Elasticsearch 7.6.2

Hosting:

OVH Public Cloud (Openstack)
Docker Swarm is our goal, but we tested and observed the same results without Docker. 
Debian 9.13
openjdk 16
Linux kernel 4.9.0

Sizing:

For the simplified hosting setup:
3 nodes OVH type b2-30
8 vcores per server
30GB RAM
SSD
500Mbps guaranteed/up to 2Gbps local network
These nodes were dedicated to ElasticSearch.

Index:

health status index                      uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   our-index-v1.1 6yUxerMtTViMlAtOO498Jg   3   1   12830216      2976961     31.7gb           16gb

Index settings:

{
  "our-index-v1.1" : {
    "settings" : {
      "index" : {
        "refresh_interval" : "1s",
        "number_of_shards" : "3",
        "number_of_replicas" : "1",
        "store" : {
          "preload" : [
            "nvd",
            "dvd"
          ]
        },
        "uuid" : "6yUxerMtTViMlAtOO498Jg"
      }
    }
  }
}

Index mapping:

{
  "our-index-v1.1" : {
    "mappings" : {
      "properties" : {
        "matchingGroups" : {
          "properties" : {
            "address" : {
              "type" : "text"
            },
            "naming" : {
              "type" : "text"
            },
            "onlineData" : {
              "type" : "text",
              "analyzer" : "french"
            },
            "openData" : {
              "type" : "text",
              "analyzer" : "french"
            },
            "internalData" : {
              "type" : "text",
              "analyzer" : "french"
            }
          }
        },

        /* 
         * Following: Fields that are copied (copy_to) to matchingGroups up there.
         */
        "example1": {
          "type" : "text",
          "index" : false,
          "copy_to" : [
            "matchingGroups.naming"
          ]
        },
        "example2": {
          "type" : "keyword",
          "index" : false,
          "copy_to" : [
            "matchingGroups.naming"
          ]
        }
        /* 
         * ... And more fields ... 
         */
      }
    }
  }
}

Search query:

GET /our-index-v1.1/_search/template
{
  "params": {
    "value_to_search": "spécialiste en poterie"
  },
  "source": {
    "from": 0,
    "size": 100,
    "query": {
      "bool": {
        "must": [
          {
            "bool": {
              "should": [
                {
                  "match_phrase": {
                    "matchingGroups.naming": {
                      "query": "{{value_to_search}}",
                      "slop": 0,
                      "boost": 20
                    }
                  }
                },
                {
                  "match": {
                    "matchingGroups.naming": {
                      "query": "{{value_to_search}}",
                      "boost": 6
                    }
                  }
                },
                {
                  "match_phrase": {
                    "matchingGroups.openData": {
                      "query": "{{value_to_search}}",
                      "slop": 0,
                      "boost": 20
                    }
                  }
                },
                {
                  "match": {
                    "matchingGroups.openData": {
                      "query": "{{value_to_search}}",
                      "boost": 6
                    }
                  }
                },
                {
                  "match_phrase": {
                    "matchingGroups.onlineData": {
                      "query": "{{value_to_search}}",
                      "slop": 0,
                      "boost": 20
                    }
                  }
                },
                {
                  "match": {
                    "matchingGroups.onlineData": {
                      "query": "{{value_to_search}}",
                      "boost": 6
                    }
                  }
                },
                {
                  "match_phrase": {
                    "matchingGroups.internalData": {
                      "query": "{{value_to_search}}",
                      "slop": 0,
                      "boost": 20
                    }
                  }
                },
                {
                  "match": {
                    "matchingGroups.internalData": {
                      "query": "{{value_to_search}}",
                      "boost": 6
                    }
                  }
                }
              ]
            }
          }
        ],
        "must_not": [
          {
            "term": {
              "customFilter.code": {
                "value": "1000"
              }
            }
          },
          {
            "term": {
              "customFilter.code": {
                "value": "5485"
              }
            }
          },
          {
            "term": {
              "customFilter.code": {
                "value": "6540"
              }
            }
          },
          {
            "term": {
              "customFilter.code": {
                "value": "6589"
              }
            }
          },
          {
            "term": {
              "customFilter.code": {
                "value": "6599"
              }
            }
          },
          {
            "term": {
              "customFilter.code": {
                "value": "9110"
              }
            }
          }
        ]
      }
    }
  }
}

Christian_Dahlqvist · August 27, 2021, 5:50am

Do these happen during the day? If so, is there any correlation between these and slow queries?

silex-kbui · August 27, 2021, 8:49am

No, we're doing updates only during hours when our Elastic is not used.
To be sure performances are not reduced due to those creations/deletions.

silex-kbui · August 30, 2021, 8:59am

Up

We really are in a dead-end, looking for new ideas

system · September 27, 2021, 9:00am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch unknown perfomance issue Elasticsearch	2	355	July 6, 2017
Intermittent slowness in queries Elasticsearch docker	1	295	December 9, 2020
Slow first request on an index after a short amount of time Elasticsearch	6	9903	March 13, 2020
Queries get slow while indexing documents Elasticsearch	9	1794	November 5, 2020
Elasticsearch Search queries are slow for the first few attempts Elasticsearch	1	353	December 3, 2019