Errors with ElasticSearch Setup


#1

This is my first time dabbling with ElasticSearch, and unfortunately, we are experiencing a multitude of errors and I am not entirely sure where to exactly go to pinpoint this problem. Basically, we are running a Linux server that is running six different ElasticSearch containers (via Docker) along with Kibana, LogStash, and Curator. All of these services are not functioning (as in, Kibana for instance, keeps displaying "Elasticsearch plugin is red").

I have attached logs for each of the services:

  • ElasticSearch container (Host, Main):
Summary
[2018-09-26T15:53:35,467][WARN ][o.e.g.DanglingIndicesState] [elasticsearch] [[logstash-printerlogs-2016.03.26/NhTPw446TgOrtQwccd4tKQ]] can not be imported as a dangling index, as index with same name already exists in cluster metadata
[2018-09-26T15:53:35,467][WARN ][o.e.g.DanglingIndicesState] [elasticsearch] [[logstash-printerlogs-2017.06.01/Kfi1l-hWR0mh5ZWHJyHIfg]] can not be imported as a dangling index, as index with same name already exists in cluster metadata
[2018-09-26T15:53:36,218][WARN ][o.e.c.a.s.ShardStateAction] [elasticsearch] [logstash-mitelcall-2018.09.26][3] received shard failed for shard id [[logstash-mitelcall-2018.09.26][3]], allocation id [cNnVZpWoQ_GfTmPWQvYuyA], primary term [2], message [mark copy as stale]
[2018-09-26T15:53:37,730][DEBUG][o.e.a.a.i.m.p.TransportPutMappingAction] [elasticsearch] failed to put mappings on indices [[[winlogbeat-2018.09.18/zD0J53zoQx2hdtbFymLWGg]]], type [doc]
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (put-mapping) within 30s
	at org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.lambda$null$0(ClusterService.java:255) ~[elasticsearch-5.5.0.jar:5.5.0]
	at java.util.ArrayList.forEach(ArrayList.java:1249) ~[?:1.8.0_131]
	at org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.lambda$onTimeout$1(ClusterService.java:254) ~[elasticsearch-5.5.0.jar:5.5.0]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.5.0.jar:5.5.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
[2018-09-26T15:53:38,482][WARN ][o.e.c.a.s.ShardStateAction] [elasticsearch] [logstash-mitelcall-2018.09.26][4] received shard failed for shard id [[logstash-mitelcall-2018.09.26][4]], allocation id [rmrZ38sVRu68T0etBwFbfQ], primary term [2], message [mark copy as stale]
[2018-09-26T15:53:38,663][WARN ][o.e.c.a.s.ShardStateAction] [elasticsearch] [logstash-nagios][4] received shard failed for shard id [[logstash-nagios][4]], allocation id [rQQjk0yYQuSz1N4reGF2DA], primary term [0], message [master {elasticsearch}{CgpEifTvQzCQWNYJtLhwqw}{e9aaCpnbRlSerje_tO-ocg}{172.18.0.10}{172.18.0.10:9300}{ml.enabled=true} has not removed previously failed shard. resending shard failure]
  • ElasticSearch Container (One of the nodes):
Summary
[2018-09-26T16:05:29,671][WARN ][o.e.g.DanglingIndicesState] [elasticsearch1] [[logstash-test123-2016.05.25/TdsA38fHSr2jrFL9u_goSA]] can not be imported as a dangling index, as index with same name already exists in cluster metadata
[2018-09-26T16:05:29,671][WARN ][o.e.g.DanglingIndicesState] [elasticsearch1] [[logstash-test123-2016.09.14/dxmirl33SiCUHsdIEqlhGQ]] can not be imported as a dangling index, as index with same name already exists in cluster metadata
[2018-09-26T16:05:29,671][WARN ][o.e.g.DanglingIndicesState] [elasticsearch1] [[logstash-test123456-2017.07.23/AREbqw3WSsKEM-kASSWDlg]] can not be imported as a dangling index, as index with same name already exists in cluster metadata
[2018-09-26T16:05:29,671][WARN ][o.e.g.DanglingIndicesState] [elasticsearch1] [[logstash-test1234-2017.07.31/Onj_5dnyRBKE0m3x89Oj-g]] can not be imported as a dangling index, as index with same name already exists in cluster metadata
[2018-09-26T16:05:29,671][WARN ][o.e.g.DanglingIndicesState] [elasticsearch1] [[logstash-test123-2016.01.30/ZZhz7aJzSlyKIOt5AWAwmg]] can not be imported as a dangling index, as index with same name already exists in cluster metadata

Any help and pointers on where to go to solve this problem would be appreciated!


#2

More error logs (unable to submit in original post due to character size limit):

  • LogStash Container
Summary
16:08:11.786 [[main]>worker8] WARN  logstash.outputs.elasticsearch - Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://elasticsearch:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://elasticsearch:9200/, :error_message=>"Elasticsearch Unreachable: [http://elasticsearch:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
16:08:11.786 [[main]>worker8] ERROR logstash.outputs.elasticsearch - Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://elasticsearch:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>64}
16:08:13.598 [Ruby-0-Thread-18: /usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-7.3.1-java/lib/logstash/outputs/elasticsearch/http_client/pool.rb:224] WARN  logstash.outputs.elasticsearch - Restored connection to ES instance {:url=>#<URI::HTTP:0x217248a7 URL:http://elasticsearch:9200/>}
16:08:16.504 [Ruby-0-Thread-13: /usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-7.3.1-java/lib/logstash/outputs/elasticsearch/http_client/pool.rb:224] INFO  logstash.outputs.elasticsearch - Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://elasticsearch:9200/, :path=>"/"}
16:08:16.508 [Ruby-0-Thread-13: /usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-7.3.1-java/lib/logstash/outputs/elasticsearch/http_client/pool.rb:224] WARN  logstash.outputs.elasticsearch - Restored connection to ES instance {:url=>#<URI::HTTP:0x31672d0a URL:http://elasticsearch:9200/>}
  • Kibana Container
Summary
{"type":"log","@timestamp":"2018-09-26T15:32:34Z","tags":["warning","elasticsearch","admin"],"pid":1,"message":"Unable to revive connection: http://elasticsearch:9200/"}
{"type":"log","@timestamp":"2018-09-26T15:32:34Z","tags":["warning","elasticsearch","admin"],"pid":1,"message":"No living connections"}
{"type":"log","@timestamp":"2018-09-26T15:32:37Z","tags":["warning","elasticsearch","admin"],"pid":1,"message":"Unable to revive connection: http://elasticsearch:9200/"}
{"type":"log","@timestamp":"2018-09-26T15:32:37Z","tags":["warning","elasticsearch","admin"],"pid":1,"message":"No living connections"}
{"type":"log","@timestamp":"2018-09-26T15:32:39Z","tags":["warning","elasticsearch","admin"],"pid":1,"message":"Unable to revive connection: http://elasticsearch:9200/"}
{"type":"log","@timestamp":"2018-09-26T15:32:39Z","tags":["warning","elasticsearch","admin"],"pid":1,"message":"No living connections"}
{"type":"log","@timestamp":"2018-09-26T15:32:42Z","tags":["warning","elasticsearch","admin"],"pid":1,"message":"Unable to revive connection: http://elasticsearch:9200/"}
{"type":"log","@timestamp":"2018-09-26T15:32:42Z","tags":["warning","elasticsearch","admin"],"pid":1,"message":"No living connections"}
  • Curator Container
Summary
2018-09-26 15:31:10,738 INFO      Preparing Action ID: 1, "delete_indices"
Traceback (most recent call last):
  File "/usr/bin/curator", line 9, in <module>
    load_entry_point('elasticsearch-curator==5.2.0', 'console_scripts', 'curator')()
  File "/usr/lib/python2.7/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python2.7/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python2.7/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/curator/cli.py", line 211, in cli
    run(config, action_file, dry_run)
  File "/usr/lib/python2.7/site-packages/curator/cli.py", line 158, in run
    client = get_client(**client_args)
  File "/usr/lib/python2.7/site-packages/curator/utils.py", line 800, in get_client
    'Error: {0}'.format(e)
elasticsearch.exceptions.ElasticsearchException: Unable to create client connection to Elasticsearch.  Error: ConnectionError(<urllib3.connection.HTTPConnection object at 0x7f1ce160fa90>: Failed to establish a new connection: [Errno 111] Connection refused) caused by: NewConnectionError(<urllib3.connection.HTTPConnection object at 0x7f1ce160fa90>: Failed to establish a new connection: [Errno 111] Connection refused)
2018-09-26 16:10:44,308 INFO      Preparing Action ID: 1, "delete_indices"
2018-09-26 16:10:49,041 INFO      Trying Action ID: 1, "delete_indices": Delete indices older than 30 days (based on index name), for logstash-eddata

#3

Just wanted to provide an update, after running the curl -XGET 'http://elasticsearch:9200/_cluster/health?pretty' command, my output is the following:

{
  "cluster_name" : "docker-cluster",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 7,
  "number_of_data_nodes" : 4,
  "active_primary_shards" : 11258,
  "active_shards" : 11258,
  "relocating_shards" : 0,
  "initializing_shards" : 16,
  "unassigned_shards" : 18227,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 11658,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 6410121,
  "active_shards_percent_as_number" : 38.161418257008236
}

I am going to assume that perhaps the crazy high amount of unassigned shards along with the pending_tasks may be the culprit to all of these problems?


(Christian Dahlqvist) #4

That is far too many shards for a cluster that size. Please read this blog post on shards and sharding practices for some practical guidelines.


#5

Thank you for the response! I will make sure to go through that guide as I am taking over for the ElasticSearch side of things but this is my first time dabbling at this. Would setting up several new nodes (we are running this all through Docker) likely solve the issues?

Another question is whether having something such as ElasticSearch running on a Docker environment be a continued practice due to the high number of shards we may have. Docker may not be the issue here of course, but I wasn't sure if it is a common practice as I couldn't find much documentation around it.


(Christian Dahlqvist) #6

Docker is not the issue here. That is far too many shard for an Elasticsearch cluster of that size no matter on what it is running on.


#7

Hello quick question - It is obvious that we need to add more nodes to our cluster due to the amount of shards, I was thinking of adding perhaps 3-4 more shards. However, being that this is all in a Docker environment, can simply editing our Docker-compose with the new nodes configuration pointing to the other nodes simply get the job done? Or would there essentially be more configuration per each node?


(Christian Dahlqvist) #8

I think you need to rethink how you approach sharding and reduce the shard count dramatically. Adding a few nodes may not make a massive difference.


(system) #9

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.