Our elastic search query performance is VERY low

I have a 5 node ES cluster (each node is single core and 4Gig RAM) which is receiving data from metricbeat and winbeat via logstash. The data generally amounts to 175 GB and is stored in a per-day index.

Even when I search for a data for an hour, our queries are taking very long time.

Below is our config :

cluster.name: clustername
node.name: ${HOSTNAME}
path.data: /apps/elasticsearch-5.2.2/data,/data1,/data2
bootstrap.memory_lock: true
node.data: true
node.master: true
node.ingest: true
discovery.zen.ping.unicast.hosts: ["node1", "node2","node3", "node4", "node5"]
discovery.zen.ping_timeout: 30s
discovery.zen.minimum_master_nodes: 3
thread_pool.bulk.queue_size: 1000
xpack.security.enabled: false
indices.memory.index_buffer_size: 30%
indices.memory.min_index_buffer_size: 512mb

Am I doing something wrong?

Do I need to customise my mapping? Do I need to store a week/month's data per index?


4gb of RAM ? So something like 2gb of Java HEAP?

I believe you are seeing some few things in your logs, don't you?

Indexing as well as querying can be CPU intensive, so I suspect you may be limited by the amount of CPU available.

Except for an occasional error about some node not able to reach any of the masters, I don't see any errros. Well! I would expect to see a lot of OOMs but no trace of such errors.

My heap size is 3Gig.

I am experimenting with 64 Gig (planning to have a JVM heap of 31 Gig and 64 core node. Although my indexing rates went upto docs 20k/s, my search (running some 10 queries parallelly from a visualisation) is taking morethan 30 seconds.

But nothing related to GC in logs?

BTW what is a typical query is looking like?

I was expecting the logs to be full of GCs with the slowness we are seeing but GC not is happening that often.


Query-Type2 :
{"query":{"bool":{"must":[{"query_string":{"query":"tags: hadoop","analyze_wildcard":true}},{"query_string":{"analyze_wildcard":true,"query":"*"}},{"range":{"@timestamp":{"gte":1491676200000,"lte":1492280999999,"format":"epoch_millis"}}}],"must_not":[]}},"size":0,"_source":{"excludes":[]},"aggs":{"1":{"percentiles":{"field":"score","percents":[50],"keyed":false}},"2":{"min":{"field":"score"}},"3":{"max":{"field":"score"}}}}

We have a 9 of these queries being fired from a dashboard parallelly and that is dashboard is taking some 45 seconds to load.

Do you have the same response time when running the same query outside Kibana?

For sure you don't have enough memory for the file system cache so you are probably always reading data from disk. Are you using SSD drives?

Querying and indexing shares/competes for the same resources, so I would recommend monitoring CPU, disk I/O and GC while you are indexing and querying simultaneously. If you are running indexing at full speed, try increasing indexing throughput gradually, e.g. by altering the number of indexing threads, and see how increased indexing throughput affects query latency. Start without any indexing at all so you have a baseline for your query performance.

Each query is taking only 80 milli-seconds (it initially takes 4 seconds though). But even when I run the queries paralelly, the ensemble of queries is taking around 55 seconds (this is same as when we run them serially). Initially I though this must be related to the queue sizes. But it turned out to be not.

No we are not using SDDs.

It can't be fast with so little RAM and spinning disks IMO. The dataset is 175gb! So you read everything from disk here I think.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.