Elasticsearch is still initializing the kibana index. Too many shards maybe?

Hi all!

I have inherited a simple Elasticsearch + Kibana set up, and I have no prior experience with either.

I am getting the error in the title:

In my elasticsearch log I am getting lots of this:

[2019-09-02T01:46:35,262][DEBUG][o.e.a.a.i.m.p.TransportPutMappingAction] [0OS_DK6] failed to put mappings on indices [[[production-2019.08.18/CoiTYb_tSGSjit3_coDbzw]]], type [syslog]
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (put-mapping) within 30s
    at       org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.lambda$null$0(ClusterService.java:255) ~[elasticsearch-5.6.14.jar:5.6.14]
    at java.util.ArrayList.forEach(ArrayList.java:1257) ~[?:1.8.0_222]
    at org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.lambda$onTimeout$1(ClusterService.java:254) ~[elasticsearch-5.6.14.jar:5.6.14]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:576) [elasticsearch-5.6.14.jar:5.6.14]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_222]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_222]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222]
[2019-09-02T01:46:35,263][DEBUG][o.e.a.a.i.m.p.TransportPutMappingAction] [0OS_DK6] failed to put mappings on indices [[[production-2019.08.18/CoiTYb_tSGSjit3_coDbzw]]], type [syslog]
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (put-mapping) within 30s
    at  org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.lambda$null$0(ClusterService.java:255) ~[elasticsearch-5.6.14.jar:5.6.14]
    at java.util.ArrayList.forEach(ArrayList.java:1257) ~[?:1.8.0_222]
    at org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.lambda$onTimeout$1(ClusterService.java:254) ~[elasticsearch-5.6.14.jar:5.6.14]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:576) [elasticsearch-5.6.14.jar:5.6.14]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_222]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_222]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222]
[2019-09-02T01:46:35,270][DEBUG][o.e.a.b.TransportShardBulkAction] [0OS_DK6] [production-2019.08.18][3] failed to execute bulk item (index) BulkShardRequest [[production-2019.08.18][3]] containing [3] requests
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (put-mapping) within 30s
    at org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.lambda$null$0(ClusterService.java:255) ~[elasticsearch-5.6.14.jar:5.6.14]
    at java.util.ArrayList.forEach(ArrayList.java:1257) ~[?:1.8.0_222]
    at org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.lambda$onTimeout$1(ClusterService.java:254) ~[elasticsearch-5.6.14.jar:5.6.14]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:576) ~[elasticsearch-5.6.14.jar:5.6.14]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_222]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_222]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222]

If I run:

curl http://localhost:9200/_cluster/health?pretty

I get:

{
    "cluster_name" : "elasticsearch",
    "status" : "red",
    "timed_out" : false,
    "number_of_nodes" : 1,
    "number_of_data_nodes" : 1,
    "active_primary_shards" : 8472,
    "active_shards" : 8472,
    "relocating_shards" : 0,
    "initializing_shards" : 4,
    "unassigned_shards" : 8896,
    "delayed_unassigned_shards" : 0,
    "number_of_pending_tasks" : 7,
    "number_of_in_flight_fetch" : 0,
    "task_max_waiting_in_queue_millis" : 3406893,
    "active_shards_percent_as_number" : 48.76813262721621
}

Running:

curl -s http://localhost:9200/_settings

Gives me a lot of output, so here's a small part:

{"testing-2018.07.18":{"settings":{"index":{"creation_date":"1531872013861","number_of_shards":"5","number_of_replicas":"1","uuid":"CDlfimO2RlWHDFe26LkFMw","version":{"created":"5060999","upgraded":"5061499"},"provided_name":"testing-2018.07.18"}}},"testing-2019.07.07":{"settings":{"index":{"creation_date":"1562457610057","number_of_shards":"5","number_of_replicas":"1","uuid":"VZpPPsSfRh-mYXAtFBqDWw","version":{"created":"5061499"},"provided_name":"testing-2019.07.07"}}},"testing-2019.06.09":{"settings":{"index":{"creation_date":"1560038403742","number_of_shards":"5","number_of_replicas":"1","uuid":"kSCfg7r5QrOgCzdvth06OQ","version":{"created":"5061499"},"provided_name":"testing-2019.06.09"}}},"staging-2018.09.01":{"settings":{"index":{"creation_date":"1535760006543","number_of_shards":"5","number_of_replicas":"1","uuid":"d4H1HRopTqyWgVwTTPN4pA","version":{"created":"5060999","upgraded":"5061499"},"provided_name":"staging-2018.09.01"}}},"infra-2018.10.29":{"settings":{"index":{"creation_date":"1540771208498","number_of_shards":"5","number_of_replicas":"1","uuid":"vLMqkuuwR_uAgxZeABAQTQ","version":{"created":"5060999","upgraded":"5061499"},"provided_name":"infra-2018.10.29"}}},"infra-2019.07.20":{"settings":{"index":

And I tried:

curl -s http://localhost:9200/_cat/shards?v

But that took a long time, and didn't print any output.

After a bunch of web searching, I get the feeling that 16,000+ shards for 1 node is too many (to put it lightly?). So my current thinking is to follow the instructions to shrink the index in order to reduce the number of shards. But I don't want to rush into a solution given this is all new to me. I am also unsure how many to shrink it to.

Am I on the right track? Would that fix it?

Info (as best as I can describe):
We have Elasticsearch and Kibana running on a single server with a single node. It is running on a t3.large AWS EC2 instance. As far as I know, there are 3 log sources (3 servers sending logs).

Let me know if I can provide any more info.

Yes, that is indeed far too many shards for a cluster that size. Have a look at this blog post for some guidance.

As you only have one node you can safely disable replicas as these will never be allocated:

curl -X PUT "localhost:9200/*/_settings?pretty" -H 'Content-Type: application/json' -d'
{
    "index" : {
        "number_of_replicas" : 0
    }
}
'

Shrinking the index will help, but generally only reduce the shard count by a factor of 5 if you have been using the defaults. That would leave you with over 1700 shards, which is still too much. I would instead recommend reindexing into monthly indices to more drastically reduce the index and shard count.

In my opinion using t2 instances to run Elasticsearch nodes holding data is generally not a great idea. Indexing, querying and GC can be CPU intensive which means this instances can get throttled at bad times. Using m4/m5 instance types is generally much better.

Thanks for the reply Christian.

We've actually decided it's not worth keeping our elastic stack running at all, so I apologise for the wasted time. I'll still mark your post as the solution since it was quite helpful and informational.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.