Elasticsearch high query/fetch time


(micsnare) #1

Hello everyone,

this is my first post, and I'm slowly finding my way into the Elastic world.... I've just saw this high unusual peak in my Elasticsearch cluster
According to the elastic-hq my elasticsearch cluster has a rather high search-query and search-fetch time

Search - Query: is ~ 237.34ms
Search - Fetch: is ~ 36.23ms (!!)
especially the last one is according to the elastichq plugin rather critical.

My elasticsearch node however is not really busy (low load or memory consumption).

Any idea where to look at or what the root cause could be if the system is actually in a healthy state?!

Also last week the elasticsearch cluster wasn't available because it crashed with a out-of-memory error (java heap -size was out of memory....HeapDumpOnOutOfMemoryError)

the Elasticsearch nodes have the following hardware specs:
10 vcores
32GB memory

whereas 12GB are now reservered for the java heap-size (prior to the crash it was only 6GB!!).

I'm currently investigating why this happened to be able to prevent this from happening in the future,... so more or less predictive maintenance. I have currently aggregated 1.4TB of log data

The elasticsearch configuration is:

cluster.name: graylog
path.data: /opt/elasticsearch/
network.host: 0.0.0.0
transport.tcp.port: 9300
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["1.2.3.4:9300"]
index :
refresh_interval: 1s

Many thanks in advance,

cheers,
theresa


(Peter Dyson) #2

Hi,

A general recommendation for heap setting is 50% of host ram, but not going over the Compressed OOPS limit (somewhere around 30.5gb but it can vary depending).

A great article on how to determine the max heap for your particular combination of architecture and jvm is:

If you have a very large retention of data, many months worth, then doing aggregations across this data might now be requiring much more resources than before, which could be why you have hit an OOM.

There's also the possibility that having too many shards (too many per node) might also be causing shard overhead where the cluster is spending more time just managing the shards than doing "work" and some ways to reduce this are to reduce number of shards per index (while staying under about 30-50gb shard size) or to add more resources/nodes.

Check out the Reindex API and Shrink API (if on a recent version) to help reducing shards.

How many indices and how many shards per index do you currently have and what is your typical index size?


(micsnare) #3

Hi geekpete,

thanks a lot for your thorough explanation. This makes a lot of sense to me.
I currently only have one elasticsearch data node (and also only one elasticsearch master node) because the customer doesn't want to spend more money on resources/hardware.
We are aware that this is not a HA-setup, so in case the server has a problem then the elasticsearch environment is down (works as designed).
Thus, the elasticsearch data node has to do all the heavy-lifting, currently has 176 shards with 44 indices.
The retention strategy is to delete all data older than 90 days, and the index is being rotated on a daily basis.

176 shards, is this too much for a single-node cluster?? (with 10vcpu and 32gb ram, whereas 12gb is the heap-size)

I'm currently on elasticsearch-2.4.5-1

cheers,
theresa


(micsnare) #4

Hi,

any idea?
Are 196 shards with 49 indices too much for a single-node cluster to handle?? (with 10vcpu and 32gb ram, whereas 12gb is the heap-size)
The goal is to have a retention time of 90 days worth of logs with the index being rotated daily.

cheers,
theresa


(Peter Dyson) #5

What are your shard sizes like?


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.