Query on timeseries data very slow

Hi, this follows previous questions. I have merged all my indices into 1 ilm based index, with a added field that contains the previous "index" name...

Generally this is fine, with some performance hit. Queries are generally a but slower, but OK. I've done this for scalability as I had to many indices. However I am finding a particular query to be much slower...

I am trying to determine the min and max date of a range of documents. On the same test node (which has the new merged index and all the old indices) using the previous many index model, the query returns in 260ms but the same query on the newly merged index it takes 2500ms. What is going on?

I run two queries and sort by date in both direction, taking the top document.

{
      "query": {
        "bool": {
          "must": [
            {
              "term": {
                "dataseriesId": dataseriesId
              }
            },
            {
              "range": {
                "date": {
                  "gte": from.format(),
                  "lte": to.format()
                }
              }
            }
          ]
        }
      },
      "sort": {
        "date": sort
      },
      "size": 1
    }

which gives me the document with the min and max of the date field for docs with the given dataseriesId.

from and to are my chosen date range.

index template mapping is

mappings: {
          _source: {
            enabled: true
          },
          _routing: {
            required: false
          },
          properties: {
            type: { type: 'keyword', index: true },
            datasetId: { type: 'keyword', index: true },
            dataseriesId: { type: 'keyword', index: true },
            date: { type: 'date', format: 'date_time_no_millis' },
            floatValue: { type: 'float', index: false, ignore_malformed: true, coerce: false },
            stringValue: { type: 'keyword', index: false },
            dateValue: { type: 'date', index: false, format: 'date_time_no_millis' }
          }
        },

I've managed to massively improve this query (now 40ms instead of 2500ms)... by using aggreagations... although it would be good to understand why it is so much slower/faster just because of merging the indices..

My new query is

"query": {
        "bool": {
          "must": [
            {
              "term": {
                "dataseriesId": dataseriesId
              }
            },
            {
              "range": {
                "date": {
                  "gte": from.format(),
                  "lte": to.format()
                }
              }
            }
          ]
        }
      },
      "aggs": {
        "min": { "min": { "field": "date" } },
        "max": { "max": { "field": "date" } }
      },
      "size": 0
1 Like

How many indices and shards did you originally have?

How many do you have now?

Do the query target all indices in both scenarios?

Hi Christian, thanks for the reply.

Originally there where 275 indices and now there is 1. This is a staging server, so not at full load. I am hosting both the 275 indices and the merged 1 index on the same ES node for testing purposes and comparing the same query going through the 275 indices vs the 1 index.

In both versions of the query (275 vs 1 index) the actual query would only target 1 index each. That is because it can choose the index based on an id, which has become a field in the merged index.

Thanks,

Querying a single small index should be faster than querying a single larger index, so that is not surprising. The significant different in timings does surprise me though, but would depend on how caching is applied. Do the latency stay at 2500ms even if you run a few queries in sequence?

When deciding how to shard data there is always a tradeoff. Indexing into a single index can be a lot more efficient than spreading the data over a lot of small indices. There is always a potentially significant difference in overhead.

Yes indeed. I was expecting a slight hit, but not that much. Yes the latency stays at 2500ms in fact it starts at 3000ms then comes down after a few requests...

I did also find it interesting that then using the aggregation option on the query which would have been going through the same documents on the merged index was so much faster... from 2500ms down to 40ms? I don't understand what is happening behind the scene, that could cause that difference on a similar query.... That said it does take it from two queries down to one... but 1250ms to 50ms is still a big change...

An aggregation does not need to score the documents that match, so that saves some time. It also does not need to retrieve any document from disk. What is the specification of your cluster? How much data do you have in it?