Forgive me, for I am a n00b when I comes to Unix or even Logstash/Elasticsearch. I had several localized clusters online for almost a year when I began upgrading them to 5.x.
My environment is such that, each location operates as an island. So I built each site's Logstash instance to inject into the local install of Elasticsearch as well as pipe the output to my two data centers. My two data center clusters are comprised of 8 servers. There is a Kibana client node, 3 master nodes, and 4 data nodes. Both data center clusters were completely wiped and rebuilt, not upgraded from ELK 2.x. Since the upgrade things have run so much smoother, than they already were. But, my local sites, where the ELK 5.x stack is running I am now getting error messages such as this:
<13>Jan 16 14:49:22.111396 ELKSTACK [2017-01-16T14:49:22,039][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 429 ({"type"=>"es_rejected_execution_exception", "reason"=>"rejected execution of org.elasticsearch.transport.TransportService$6@2e62fc7f on EsThreadPoolExecutor[bulk, queue capacity = 50, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@5e766343[Running, pool size = 4, active threads = 4, queued tasks = 50, completed tasks = 14778073]]"})
I have not been able to find where I need to configure my change in my configuration files to accommodate increasing the bulk queue capacity. As it stands right now, there are roughly 7 million logs being stored int he cluster per hour.
I have also removed the output configuration options on the local site ELK stacks that piped to their local Elasticsearch instance. Now, they just Logstash parse the logs and ship them to my two data center sites.
Looking through the logs in the data centers, I do not see any error messages or anything suggesting it is a Elasticsearch problem.
My cluster health shows this:
user@CLUSTER-MASTER-01:/var/log/elasticsearch$ curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
{
"cluster_name" : "CLUSTER",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 8,
"number_of_data_nodes" : 4,
"active_primary_shards" : 9306,
"active_shards" : 18611,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}
Can someone please advise what file I need to modify to increase the bulk queue capacity?