Query performance problems upgrading from ES 6.8.6. to ES 7.6.2, with exact term filtering + range

Hi!

We are trying to upgrade our deployment from an Elastic Heroku-addon (Premium 3) to a new deployment on elastic.co, using the I/O optimized general purpose setup.

Old cluster: 8GB RAM + 192GB storage * 1 node * 2 zones
New cluster: 8GB RAM + 240GB storage * 1 node * 2 zones

The old deployment is running 6.8.6, and the new one is running 7.6.2.

Apart from the version upgrade, we’ve made two changes:

  1. We’ve updated our mapping from
{
  event: {
    properties: {
      action: { type: :keyword },
      actor_id: { type: :long },
      actor_type: { type: :keyword },
      authenticated_by_id: { type: :long },
      authenticated_by_type: { type: :keyword },
      code: { type: :keyword },
      created_at: { type: :date },
      id: { type: :long },
      message: { type: :text, index: false },
      object_id: { type: :long },
      object_type: { type: :keyword },
      owner_id: { type: :long },
      owner_type: { type: :keyword },
      partition: { type: :long },
      previous_event_id: { type: :long },
      reference_id: { type: :long },
      reference_type: { type: :keyword },
      success: { type: :boolean }
    }
  }
}

to

{
  dynamic: "strict",
  properties: {
    action: { type: :keyword },
    actor_id: { type: :keyword },
    actor_type: { type: :keyword },
    authenticated_by_id: { type: :keyword },
    authenticated_by_type: { type: :keyword },
    code: { type: :keyword },
    created_at: { type: :date },
    details: { type: :object, enabled: false },
    id: { type: :keyword },
    message: { type: :text, index: false },
    object_id: { type: :keyword },
    object_type: { type: :keyword },
    reference_id: { type: :keyword },
    reference_type: { type: :keyword },
    root_id: { type: :keyword },
    root_type: { type: :keyword },
    sequence: { type: :keyword },
    success: { type: :boolean }
  }
}

So we added dynamic: strict, plus we renamed a few of the fields (owner -> root, partition -> sequence). I'm assuming the only potentially interesting thing here is that we changed some long properties to keyword. We did that based on the recommendation in https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html. We're only interested in doing exact matches on these properties, so term queries.

  1. We applied a sort on the index settings:
index: {
  "sort.field": ["created_at", "id"],
  "sort.order": ["desc", "desc"]
}

We did this because of the advice in https://www.elastic.co/guide/en/elasticsearch/reference/master/index-modules-index-sorting.html. Our queries always sort on these properties, and our data is a time series.

All in all, we have approximately 600 million documents in the index in the old deployment, which we re-index to the new deployment.

Switching our production application to query this new cluster, with the exact same query as before, apart from the renamed owner_id and owner_type, left us with quite a big drop in performance. Hoping someone here can shed some light on what we're doing wrong.

Here's two screenshots from Kibana from the old vs new cluster. Now, I don't know that much about ES, but one thing that stands out to me is Query Cache is much bigger in the old cluster. Is it possible we need to run the new cluster for a significant amount of time to build it up?


The query

First let me explain what we want to do. We want to query the documents, allowing for exact match filtering on:

  • action
  • actor_id
  • actor_type
  • object_id
  • object_type
  • root_id
  • root_type
  • reference_id
  • reference_type

In addition, we want to allow filtering with a range on created_at.

Typically, only these are used:

  • root_id
  • root_type
  • reference_id
  • reference_type

The current query we have that does this looks something like this:

{
  "query": {
    "bool": {
      "filter": [
        { "bool": { "must": { "terms": { "root_id": [ "123456" ] } } } },
        { "bool": { "must": { "term": { "root_type": "Place" } } } },
        { "bool": { "must": { "term": { "reference_type": "Place" } } } },
        { "bool": { "must": { "terms": { "reference_id": [ "123456" ] } } } }
      ]
    }
  },
  "sort": [
    { "created_at": "desc" },
    { "id": "desc" }
  ]
}

And here is one with a range query:

{
  "query": {
    "bool": {
      "filter": [
        { "bool": { "must": { "terms": { "root_id": [ "123456" ] } } } },
        { "bool": { "must": { "term": { "root_type": "Place" } } } },
        { "bool": { "must": { "term": { "reference_type": "Place" } } } },
        { "bool": { "must": { "terms": { "reference_id": [ "123456" ] } } } },
        { "bool": { "must": { "range": { "created_at": { "gte": "2020-04-01", "lte": "2020-05-01" } } } } }
      ]
    }
  },
  "sort": [
    { "created_at": "desc" },
    { "id": "desc" }
  ]
}

Both queries perform better in the old cluster.

  1. Does anything here look like it would perform slower in ES 7.6 compared to 6.8?
  2. Is there a better query for achieving what we want that would perform better in ES 7.6?

Thanks!