Query, aggregation (date_histogram) and timezone performance ES 5.6.2

JohanRask · October 19, 2017, 11:13am

Hi!

We are running ES 5.6.2 on an 8 core, 64GB (30 GB heap). We are using a SAN (yes, I know this is not recommended but let me continue). During the tests that we perform at the moment we use a single data node + separate master.

Yesterday I started to do some basic ingest performance tests with the settings below (tuning some of the parameters a little bit).

"index.number_of_replicas": 0,    
"index.number_of_shards" : 1,    
"index.refresh_interval" : "-1",    
"index.translog.durability" : "async",    
"index.translog.flush_threshold_size": "1gb"
 
PUT _cluster/settings { "transient" : { "indices.store.throttle.type" : "none"    }  } (reverted to merge before search)
PUT _cluster/settings { "persistent" : { "indices.store.throttle.max_bytes_per_sec" : "200mb"   } }

This is what we are indexing

 "_source": {
    "@version": "1",
    "@timestamp": "2017-10-18T15:18:24.595Z",
    "message": "contents of an access log file",
    "kafka": {
      "topic": "perf_test",
      "consumer_group": "perf_test",
      "partition": 1,
      "offset": 2165328553,
      "key": null
    }
  }

Mapping

 "mappings" : {
    "_default_" : {
      "_all" : {"enabled" : false, "norms" : false},
      "dynamic_templates" : [ {
        "message_field" : {
          "path_match" : "message",
          "match_mapping_type" : "string",
          "mapping" : {
            "type" : "text",
            "norms" : false
          }
        }
      }, {
        "string_fields" : {
          "match" : "*",
          "match_mapping_type" : "string",
          "mapping" : {
            "type" : "text", "norms" : false
          }
        }
      } ],
      "properties" : {
        "@timestamp": { "type": "date", "include_in_all": false },
        "@version": { "type": "keyword", "include_in_all": false },
        "geoip"  : {
          "dynamic": true,
          "properties" : {
            "ip": { "type": "ip" },
            "location" : { "type" : "geo_point" },
            "latitude" : { "type" : "half_float" },
            "longitude" : { "type" : "half_float" }
          }
        }
      }
    }
  }

We had an avg throughput of 55.000 docs per second which i think is quite ok. The index now contains 364 million records accross two hours. I was not able to track down any weird I/O related things during indexing so the SAN seems to work ok for now.

The problem: After this I went to kibana to just have a look the the result but I kept getting timeouts until I increased the timeout a lot. The default kibana (Discovery page, 24 hours, query=*) takes over 130 seconds and the CPU (one core) goes up to 100%. During this time we have no other queries and no indexing is performed. If I start a new query the CPU goes to 200% but the response times ("took":time_ms) are the same.

This is the query that kibana uses.

{
  "version": true,
  "size": 500,
  "sort": [
    {
      "@timestamp": {
        "order": "desc",
        "unmapped_type": "boolean"
      }
    }
  ],
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "query": "*",
            "analyze_wildcard": true
          }
        },
        {
          "range": {
            "@timestamp": {
              "gte": 1508316014801,
              "lte": 1508402414801,
              "format": "epoch_millis"
            }
          }
        }
      ],
      "must_not": []
    }
  },
  "_source": {
    "excludes": []
  },
  "aggs": {
    "2": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "30m",
        "time_zone": "Europe/Berlin",
        "min_doc_count": 1
      }
    }
  },
  "stored_fields": [
    "*"
  ],
  "script_fields": {},
  "docvalue_fields": [
    "@timestamp"
  ],
  "highlight": {
    "pre_tags": [
      "@kibana-highlighted-field@"
    ],
    "post_tags": [
      "@/kibana-highlighted-field@"
    ],
    "fields": {
      "*": {
        "highlight_query": {
          "bool": {
            "must": [
              {
                "query_string": {
                  "query": "*",
                  "analyze_wildcard": true,
                  "all_fields": true
                }
              },
              {
                "range": {
                  "@timestamp": {
                    "gte": 1508316014801,
                    "lte": 1508402414801,
                    "format": "epoch_millis"
                  }
                }
              }
            ],
            "must_not": []
          }
        }
      }
    },
    "fragment_size": 2147483647
  }
}

Remove aggregations: If I remove the aggs2 part of the same query it takes approx 7 seconds to perform this query which is of course much faster but still sort of long time.

Remove timezone: If I keep the aggs2 but remove the timezone part it goes from 130 secs down to 26 seconds. I still think this is way to long but I am not sure what to expect here.

The question: Is this expected behaviour? I was expecting this to be a bit faster and not consume this amount of cpu. System load is low and I cannot find any I/O issues during search either. Or is 300 million docs to many to work with even with this simple query?

Any suggestions are highly appreciated.

Regards /Johan

system · November 16, 2017, 11:13am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Aggregation query 5x faster when timezone is removed from query Elasticsearch	3	711	November 27, 2017
Performance issues Elasticsearch	9	459	July 6, 2017
Failing to retrieve results from ES when the timezone is UTC+1, UTC+2 - Problem started in indexes opened since 1/1/2019 Elasticsearch	15	1389	February 24, 2019
Performance when using `time_zone` in date_historgram Elasticsearch	3	637	December 19, 2017
How to improve performance in this case? Elasticsearch	7	753	September 23, 2019

Query, aggregation (date_histogram) and timezone performance ES 5.6.2

Related topics