New install - shard failure with 1 client

Puk · August 1, 2017, 2:59am

So after testing ELK Stack in a test lab with some VMs and a bunch of clients with Winlogbeat, filebeat and metricbeat, I decided today to go for it and put this into my live envrionment.

I configured the stack exactly the same as I did in my lab, using the same notes that let me stand it up 3 times before and everything seemed to work. I put 1 client in with Winlogbeat and it looks fine.

BUT, after an hour I try some searches and I get shard errors? The only difference in my config is this new ELK server (all in one) is higher spec (4x core, 8gb ram and I split the data/logs onto a seperate mount point). Any ideas why?

Index: winlogbeat-2016.02.20 Shard: 1 Reason: {"type":"es_rejected_execution_exception","reason":"rejected execution of org.elasticsearch.transport.TransportService$7@4d65a7e6 on EsThreadPoolExecutor[search, queue capacity = 1000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@4750cd47[Running, pool size = 7, active threads = 5, queued tasks = 995, completed tasks = 31245]]"}

warkolm · August 1, 2017, 3:52am

That indicates that your cluster is overloaded and the internal queues that it holds (threadpools) are full and cannot deal with any more.

Puk · August 1, 2017, 3:54am

Thanks, yeah I couldn't understand why this would happen with a single client, but I may have just found it.

Even though I only had 1 client reporting it, I realized I forgot to flag the winbeat.yml file to only use the last 24hrs, so it pulled in over 2yrs of logs in one go!!! So I stopped that, changed it to 24hrs, removed the resume file and flushed the elasticsearch data with curl -XDELETE 'http://localhost:9200/*'

Its now only got 24hrs of data and its not throwing errors. I guess this was just because it was too much info to injest in one go?

Are there any docs that give recommended specs for this sort of thing? THe place I work has around 250 servers that I want to injest for Wintel EventLogs and Linux Secure/Messages, plus maybe VMware hosts and Cisco switches.

warkolm · August 1, 2017, 3:58am

The error refers to the search threadpool, so it's unlikely that ingestion that is impacting this.

What are the specs for the cluster now? How many nodes etc?

system · August 29, 2017, 3:58am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ELK 7 with shards failing Elasticsearch	3	393	October 11, 2019
The request for this panel failed all shards failed metrixbeat 7.3.1 Beats metricbeat	3	1862	October 28, 2019
Shards issue when trying to add a filter Beats	3	265	November 17, 2020
Filebeat and Winlogbeat problem Elasticsearch	2	467	August 8, 2019
Winlogbeat not working Beats winlogbeat	5	659	September 23, 2019

New install - shard failure with 1 client

Related topics