Kibana Timeouts/Shard failed errors

jamesp220291 · July 5, 2019, 7:04am

Hi

Elastic search cluster details -

3x Master only nodes
3x Data nodes - 8GB Ram 8CPUs
1x Coordinator Node - 8GB Ram - 6GB Heap

Heap size - 20GB
Documents - 401,859,738
Indicies - 249
Primary Shard - 745 - 3 per Indicie.

Daily Indicies for logs, average Indicie size is 500MB-1GB (Around 20 are 10GB+)

I am having an issue when search far back (Around 6 months+) I am getting time outs/shard failed errors. There is not really anything useful in the elastic search logs except this (Occasionly not every time)

org.elasticsearch.transport.RemoteTransportException: [sys-elastic-data-1][172.16.10.122:9300][indices:data/read/search[phase/query]]
Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<agg [1]>] would be [4031321656/3.7gb], which is larger than the limit of [4013975142/3.7gb], real usage: [4031316536/3.7gb], new bytes reserved: [5120/5kb]

Sometimes when I get the timeouts, if i then click refresh the data loads straight away, so it almost like its nearly loading it/caching it and then showing.

I am just looking for some advice on improving performance.

What I am thinking atm is to force merge all Indicies - Will this help?

Will increase my nodes help? Since all my Indicies have 3 shards, I dont see how scaling up my clusters could help unless I re-index and change the amount of shards to match the node count.

Christian_Dahlqvist · July 5, 2019, 7:09am

If you have 1 replica configured you have around 1500 shards in the cluster, which sounds like a lot for the amount of heap you have. If you have a long retention period, try to switch to monthly or weekly indices so you get to an average shard size over 10GB.

If you are suffering from heap pressure, forcemerging older indices down to a single segment should help.

jamesp220291 · July 5, 2019, 7:17am

Hi, Thanks for your reply

When I look in the Monitoring section in Kibana, the heap usage never really goes over 50%.
At the moment it is sat @ 9.0 GB / 20.7 GB

We don't really search this ES cluster alot, it is basically just for statistics ect (It's not like being searched every minutes) probabaly only a few times a day.

In regards to force merge, do I need to make the index read-only before doing this? Currently they are all open, however only the latest days indicie is written to.

Would increasing the amount of nodes I have help? - Or would this not be useful unless I reindexed everything to have say 6 shards (for 6 nodes) I read somewhere your sharing policy should be around 1:1 for your amount of data nodes.

Christian_Dahlqvist · July 5, 2019, 8:03am

Try to change do you have fewer larger shards as I described. Querying lots of small shards can be slow.

jamesp220291 · July 5, 2019, 8:10am

How do you find out the shard size.

Is each shard a full copy of the indicies or do you divide the indicie by the amount of shards?

As some of the daily indicies can be as large as 20-30GB (depending on log traffic on that day)

Christian_Dahlqvist · July 5, 2019, 8:15am

Look at the _cat/shards API. If you have indices that generate over 10GB per day, keep these as daily indices.

jamesp220291 · July 5, 2019, 8:35am

The problem is , I don't know before hand. So it is my download CDN logs, and some days they are 1GB some days they are 20GB, but its the same Index, just different days. so in terms of creating them, there is no way of logstash knowing in advance.

Christian_Dahlqvist · July 5, 2019, 10:59am

Then like ok into using the rollover API together with ILM.

jamesp220291 · July 5, 2019, 1:18pm

ILM wont really work, that is going off the assumption that data from e.g a month ago wont be access regularly so is cold ect.

When we load our visulisations up we usually do it for the whole time period ( 1 year +) as we like to see any long term trends.

This when we are getting the problems, if i do a smaller time scale, it works fine.

This why am i asking if force merge, will make it quicker to search that far back, or not?

Christian_Dahlqvist · July 5, 2019, 1:21pm

ILM allows you to define different zones, but does not require it. I therefore think it would work fine.

jamesp220291 · July 8, 2019, 6:19am

So

What you are saying is to set up an ILM to roll over the index after x amount of time/size?

Something like this -

PUT _ilm/policy/datastream_policy { "policy": { "phases": { "hot": { "actions": { "rollover": { "max_size": "50GB", "max_age": "30d" } } },

PUT _template/datastream_template { "index_patterns": ["datastream-*"], ![](https://www.elastic.co/guide/en/elasticsearch/reference/current/images/icons/callouts/1.png) "settings": { "number_of_shards": 1, "number_of_replicas": 1, "index.lifecycle.name": "datastream_policy", ![](https://www.elastic.co/guide/en/elasticsearch/reference/current/images/icons/callouts/2.png) "index.lifecycle.rollover_alias": "datastream" ![](https://www.elastic.co/guide/en/elasticsearch/reference/current/images/icons/callouts/3.png) } }

Since I have all ready indexed all my daily indexs, I am assuming I would have to reindex them all somehow?

jamesp220291 · July 9, 2019, 7:01am

Hi

Just a follow up I keep hearing people talk about shard size.

How do I know how big my shards are?

If I have an indicie which is 20GB, and it has 4 shards, does that mean each shard is 5GB? or is each shard 20GB?

system · August 6, 2019, 7:01am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Optimizing Kibana / Elastic for SIEM to stop Discover: Gateway Timeout errors Elasticsearch	4	866	February 6, 2019
All shards failed even after tuning Elasticsearch	10	1061	August 13, 2020
Elasticsearch timeout for search query Elasticsearch	7	3713	May 30, 2020
Performance Issues and timeouts with Elasticsearch Elasticsearch	5	5944	January 11, 2017
Elastic and Kibana Load Balancing Elasticsearch	27	697	August 10, 2019

Kibana Timeouts/Shard failed errors

Related topics