Instant Aggregations


(Grégoire Leroy) #1

Hello,

I recently upgraded to Elasticsearch 5.1.2 and I was eager to see the instant aggregations in action. I have a Kibana dashboard I refresh every 10s.

I expected the first request to take me between 1 and 2s, and the next ones a few dozens of ms, thanks to the request_cache as described in https://www.elastic.co/blog/instant-aggregations-rewriting-queries-for-fun-and-profit

However, it's not what happens, all queries take between 1 and 2s.

According to the documentation, the request_cache is enabled by default for the indice and I don't override it for the request.

I indeed see that the request_cache is populated but I see a lot of miss :

curl -XGET 'localhost:9201/logstash_myindice-v1-2017.02.01/_stats/request_cache?pretty'
{
  "_shards" : {
    "total" : 12,
    "successful" : 12,
    "failed" : 0
  },
  "_all" : {
    "primaries" : {
      "request_cache" : {
        "memory_size_in_bytes" : 77417,
        "evictions" : 0,
        "hit_count" : 254,
        "miss_count" : 710
      }
    },
    "total" : {
      "request_cache" : {
        "memory_size_in_bytes" : 413087,
        "evictions" : 0,
        "hit_count" : 507,
        "miss_count" : 1557
      }
    }
  },
  "indices" : {
    "logstash_myindice_v1-2017.02.01" : {
      "primaries" : {
        "request_cache" : {
          "memory_size_in_bytes" : 77417,
          "evictions" : 0,
          "hit_count" : 254,
          "miss_count" : 710
        }
      },
      "total" : {
        "request_cache" : {
          "memory_size_in_bytes" : 413087,
          "evictions" : 0,
          "hit_count" : 507,
          "miss_count" : 1557
        }
      }
    }
  }
}

Did I miss something ?

An example of kibana generated request:

{
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "query": "*",
            "analyze_wildcard": true
          }
        },
        {
          "query_string": {
            "analyze_wildcard": true,
            "query": "netflow.direction:0"
          }
        },
        {
          "range": {
            "@timestamp": {
              "gte": 1485928995494,
              "lte": 1485943395494,
              "format": "epoch_millis"
            }
          }
        }
      ],
      "must_not": []
    }
  },
  "size": 0,
  "_source": {
    "excludes": []
  },
  "aggs": {
    "2": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "5m",
        "time_zone": "Europe/Berlin",
        "min_doc_count": 1
      },
      "aggs": {
        "1": {
          "sum": {
            "field": "netflow.in_bytes"
          }
        }
      }
    }
  }
}

Is there an issue with the size of the cache, or rather with the request which can't take advantage of the instant aggregation ?

Regards,
Grégoire


(Christian Dahlqvist) #2

The rewriting of queries in order to better utilise the cache applies to indices that are entirely within the time period and have not been updated (see the colourful images at the end of the blog post you linked to). The indices at the end of the interval will not be able to cache. How many indices do your query cover? How many of these are entirely within the time interval and are not being updated/modified?


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.