Reducing heap size increases query speed

sbruinsje · August 10, 2022, 2:58pm

I have recently upgraded from ES 6.8 to ES 8.3. One of the queries with heavy aggregations take significantly longer now compared to before. After some searching I found that in the new version ES sets the heap size automatically (to 1.8GB in my case), instead of the fixed default of 1GB in ES 6.8. It turns out that when I decrease the heap size back to 1GB the queries are faster again. I confirmed this by setting the heap size back to be determined automatically after which its significantly slower again.

The cluster is running on a single node with 4GB ram with 5 shards for the index. The index has a doc count of 18 million. Could it be that ES takes up so much heap space that there's not enough memory left for the filesystem cache? Or is there another explanation for why decreasing the heap size makes the queries slower.

For completeness here's the query we use:

POST article_set_index/_search
{
  "size": 0,
  "query": {
    "nested": {
      "path": "versions",
      "score_mode": "max",
      "inner_hits": {
        "size": 100,
        "highlight": {
          "pre_tags": [
            "<mark>"
          ],
          "post_tags": [
            "</mark>"
          ],
          "fields": {
            "versions.title": {
              "type": "fvh",
              "number_of_fragments": 0
            },
            "versions.body": {
              "type": "fvh",
              "number_of_fragments": 0
            }
          }
        },
        "sort": {
          "versions.published": {
            "order": "asc"
          }
        }
      },
      "query": {
        "bool": {
          "must": [
            {
              "simple_query_string": {
                "query": "the",
                "default_operator": "and",
                "fields": [
                  "versions.title",
                  "versions.body"
                ]
              }
            }
          ],
          "filter": [
            {
              "terms": {
                "versions.language": [
                  "nl"
                ]
              }
            },
            {
              "range": {
                "versions.created": {
                  "gte": 1644382800629
                }
              }
            },
            {
              "range": {
                "versions.publicationDate": {
                  "gte": "2022-02-09"
                }
              }
            }
          ]
        }
      }
    }
  },
  "aggs": {
    "totalAuthorCount": {
      "cardinality": {
        "field": "authorIds",
        "precision_threshold": 100
      }
    },
    "authors": {
      "terms": {
        "field": "authorIds",
        "size": 121,
        "shard_size": 400
      },
      "aggs": {
        "articles": {
          "top_hits": {
            "size": 5
          }
        }
      }
    }
  },
  "timeout": "60000ms"
}

Christian_Dahlqvist · August 10, 2022, 3:10pm

That would be my guess. What is the size of the index?

sbruinsje · August 10, 2022, 3:18pm

About 38 GB:

GET /article_set_index/_stats?pretty
{
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "_all": {
    "primaries": {
      "docs": {
        "count": 18367658,
        "deleted": 583
      },
      "shard_stats": {
        "total_count": 5
      },
      "store": {
        "size_in_bytes": 38360064185,
        "total_data_set_size_in_bytes": 38360064185,
        "reserved_in_bytes": 0
      },
      ...
    }
  ...
  }
}

Side question: the docs count includes nested documents, not just root level documents, correct?

Christian_Dahlqvist · August 10, 2022, 3:19pm

Yes, if you nave nested mappings it does.

As you are running aggregations and do not return specific documents I suspect there would be a small set of files that would be cached and where a small change in heap size could make a big difference.

sbruinsje · August 10, 2022, 3:35pm

Hmm I don't think I follow. In my case decreasing the heap size makes the query with aggregations finish faster. If only a small set of files is being cached then why would decreasing heap size improve the speed?

Christian_Dahlqvist · August 10, 2022, 3:56pm

I meant small compared to the full size of your index. If the amount of files that need to be cached is a few GB or so, decreasing the heap size would make a difference.

sbruinsje · August 12, 2022, 8:43am

It turns out the difference in speed was not related to the heap size. I probably made some mistakes during measurements of the speed of the queries due to caching being involved.

Thanks for your help though

system · September 9, 2022, 8:43am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ES_HEAP_SIZE Size with 10k Doc's (5 MB total)? Elasticsearch	2	510	July 6, 2017
Nested Aggregations are 5~10x times slower in ES 6.x than 5.6.x Elasticsearch	13	3582	July 16, 2018
Slow new queries on ES 2.3.3. Free memory but high sdd reads Elasticsearch	1	442	April 27, 2017
How to set the heap-size reasonably Elasticsearch	1	351	October 24, 2019
ES used heap % grows slowly until system becomes unresponsive Elasticsearch	21	5545	July 5, 2017

Reducing heap size increases query speed

Related topics