Query duration too high for ~200G logs a day

Hi,

We're using ES cluster (version 5.4.1) with 4 data nodes, 3 master, one client node (kibana).

The data nodes are r4.2xlarge aws instance (61g memory, 8vCPU) with 30G memory allocated for the ES JAVA.

We have writing of around 200G of logs every day and keep it for the last 14 days.

A one big index of 160G-170G a day (6 shards, 1 replica) and other smaller indices of 1-3G (2 shards, 1 replica)

We're dealing with performance latency in the query duration and I'm looking for recommendations to our cluster to improve the cluster performance, especially the search performance - query duration (kibana).

For example, searching for the last 6 days on the big index takes:

Query Duration 51498ms
Request Duration 52706ms

More data nodes? more client nodes? bigger nodes? more replica's? maybe improve the queries duration at the expense of writes speed(?) - anything that can improve the performance is an option.

Is there anyone with something close to this design or loads? I'll be glad to hear about other designs, loads and query stats.

Thanks,
M

What does the query look like?

the default query of kibana "*" :

  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "analyze_wildcard": true,
            "query": "*"
          }
        },
        {
          "range": {
            "@timestamp": {
              "gte": 1502994436234,
              "lte": 1503512836234,
              "format": "epoch_millis"
            }
          }
        }
      ],
      "must_not": []
    }
  }

I suspect you are hitting this bug: https://github.com/elastic/kibana/pull/13047 whose fix will soon be released. You can work around it by setting *:* as a query on the Kibana side or configuring a default search field.

I'm not sure it is, I'm getting the "discover: gateway timeout" error when I'm searching *:* on kibana:

Error: Gateway Timeout
    at respond (https://kibana.prod.caazz.com/bundles/kibana.bundle.js?v=15104:12:2730)
    at checkRespForFailure (https://kibana.prod.caazz.com/bundles/kibana.bundle.js?v=15104:12:1959)
    at https://kibana.prod.caazz.com/bundles/kibana.bundle.js?v=15104:1:9200
    at processQueue (https://kibana.prod.caazz.com/bundles/commons.bundle.js?v=15104:38:23621)
    at https://kibana.prod.caazz.com/bundles/commons.bundle.js?v=15104:38:23888
    at Scope.$eval (https://kibana.prod.caazz.com/bundles/commons.bundle.js?v=15104:39:4619)
    at Scope.$digest (https://kibana.prod.caazz.com/bundles/commons.bundle.js?v=15104:39:2359)
    at Scope.$apply (https://kibana.prod.caazz.com/bundles/commons.bundle.js?v=15104:39:5037)
    at done (https://kibana.prod.caazz.com/bundles/commons.bundle.js?v=15104:37:25027)
    at completeRequest (https://kibana.prod.caazz.com/bundles/commons.bundle.js?v=15104:37:28702)

Maybe more information will put some light on it, the parameters I'm using on elasticsearch.yml on the data nodes are:

cluster.name: prod
node.name: data1.us-west-2a.prod
network.host: ["127.0.0.1", "******"]
discovery.zen.ping.unicast.hosts: ["*****", "*****", "*****"]
discovery.zen.minimum_master_nodes: 2
cloud.aws.s3.access_key: ****
cloud.aws.s3.secret_key: *****

node.master: false
node.data: true
node.attr.zone: us-west-2a
path.data: [ /es-data-1 ]
bootstrap.system_call_filter: false
bootstrap.memory_lock: true

anyone?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.