Query performance issue with the very first one

The first time a query (classic full text query with a filter and a sort) is ran, it takes about 15-20s to get the result. It takes only few milliseconds for the next tries. Could anybody give me some explanations about the reason ? And how could I solve this and have a decent response time ?

Here is the configuration we have :
ES version 6.2.3
5 nodes (5 data nodes)
1 index with 2P Shards / 2R - 900 000 docs (35GB)
1 index with 1P Shard / 2R - 13 000 000 docs (10GB)
2 CPUs / node with 8GB RAM

Thanks

Hi @Anto89.

welcome.

could you please explain about Elastic Qyery?what query running ?

Thanks
HadoopHelp

Hi and thanks for your reply !

You'll find the kind of request we have to run at the bottom. I know the size of results I want to return is pretty big but by now I'm stuck with this behaviour. Basically what I need is to retrieve all ids of matching documents.

Before that there was another design where documents were splitted accross many indices (several thousand). We had issues when indexing but search was very fast.

POST index-name/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "NODE": "idNode"
          }
        }
      ],
      "should": [
        {
          "simple_query_string": {
            "query": "*",
            "fields": [
              "TITLE",
              "CONDITIONS",
              "CONTENT",
              "REFERENCE^5",
              "COMMENTS",
              "SUMMARY"
            ]
          }
        },
        {
          "match": {
            "NUMBER": {
              "operator": "or",
              "query": "*",
              "boost": 5
            }
          }
        }
      ],
      "minimum_should_match": 1
    }
  },
  "post_filter": {
    "bool": {
      "must": []
    }
  },
  "sort": [],
  "from": 0,
  "size": 10000,
  "_source": false
}

Hi @Anto89.

Please try to use the combination of must and should with term condition as below.may be it will give you more faster result.
"query" : { "bool" : { "should" : [ { "bool" : { "must" : [ { "term" : { "lastname.keyword" : { "value" : "Lastname2491", "boost" : 1.0 } } }, { "term" : { "firstname.keyword" : { "value" : "ram", "boost" : 1.0 } } } ], "adjust_pure_negative" : true, "boost" : 1.0 } }, { "term" : { "drugname.keyword" : { "value" : "alpha", "boost" : 1.0 } } } ], "adjust_pure_negative" : true, "boost" : 1.0 } }

Unfortunately I'm not sure it makes a difference ...
Slow logs activation seems to highlight that the fetch phase is longer :

[2020-02-17T14:35:35,840][WARN ][index.search.slowlog.fetch] [es-node-02] [index][1] took[5.8s], took_millis[5843], types[type], stats, search_type[QUERY_THEN_FETCH], total_shards[2], source[...]

Ok, after reading posts about the same kind of issue, it seems that adding

"stored_fields": "none"
"docvalue_fields": ["_id"]

in the query does the trick. More information here : https://github.com/elastic/elasticsearch/issues/17159