ELASTICSEARCH_17 - Could not index '862' records: java.io.IOException: listener timeout after waiting for [30000] ms

Rakesh39 · October 19, 2017, 5:58pm

Hi,

We are having multiple issues with elastic search cluster.

ELASTICSEARCH_17 - Could not index '862' records: java.io.IOException: listener timeout after waiting for [30000] ms
Continuously getting lot of above errors..
sometimes ES got stuck with primary shards allocation and never turn to green.
shards are not getting balanced between the nodes.

Here are my cluster settings:

4 data nodes and 1 client node.
discovery.zen.minimum_master_nodes: 3
thread_pool.bulk.queue_size: 500
http.cors.enabled: true
http.cors.allow-origin: "*"
http.cors.allow-headers: Authorization

Thank you.

Regards,
Rakesh

Christian_Dahlqvist · October 19, 2017, 9:30pm

What type of storage are you using in your cluster? What version are you on?

Rakesh39 · October 20, 2017, 1:49pm

Version: 5.0.2
Type of storage: AWS ec2 file system (disk based).

Christian_Dahlqvist · October 20, 2017, 2:00pm

Are you using EBS?

Rakesh39 · October 20, 2017, 2:23pm

yes. that's correct. We should use root volume only ?
EBS has been attached as an additional volume.

Rakesh39 · October 20, 2017, 2:25pm

And also to add, primary shards got placed in nodes 1 & 3 out of 4 noeds. somehow shards are not evenly distributed. I am not sure whether this is a problem or not. I really need some help here...

Rakesh39 · October 20, 2017, 2:29pm

Christian_Dahlqvist · October 20, 2017, 8:23pm

Exactly what type of EBS volume are you using? What does iostat look like?

Rakesh39 · October 21, 2017, 12:43am

Volume Type: gp2
IOPS: 3000

Christian_Dahlqvist · October 21, 2017, 8:21am

That sounds good. Not sure what is going on. Can you share the exact log messages and what is around them? Can you also share the full output from the cluster stats API?

Rakesh39 · October 22, 2017, 5:16am

Hi, IP's has been masked. Here is the output of cluster stat API and error logs.

Cluster Stats API

{"_nodes":{"total":5,"successful":5,"failed":0},"cluster_name":"esawstest","timestamp":1508646359193,"status":"yellow","indices":{"count":2128,"shards":{"total":21259,"primaries":10629,"replication":1.0000940822278672,"index":{"shards":{"min":2,"max":10,"avg":9.990131578947368},"primaries":{"min":1,"max":5,"avg":4.994830827067669},"replication":{"min":0.4,"max":2.0,"avg":1.0003054511278195}}},"docs":{"count":819350982,"deleted":8661},"store":{"size_in_bytes":443402059078,"throttle_time_in_millis":0},"fielddata":{"memory_size_in_bytes":64544,"evictions":0},"query_cache":{"memory_size_in_bytes":621496,"total_count":11102357,"hit_count":11094177,"miss_count":8180,"cache_size":2507,"cache_count":2507,"evictions":0},"completion":{"size_in_bytes":0},"segments":{"count":108644,"memory_in_bytes":2904752619,"terms_memory_in_bytes":2323258512,"stored_fields_memory_in_bytes":165932392,"term_vectors_memory_in_bytes":0,"norms_memory_in_bytes":120012224,"points_memory_in_bytes":129830619,"doc_values_memory_in_bytes":165718872,"index_writer_memory_in_bytes":51113132,"version_map_memory_in_bytes":239984,"fixed_bit_set_memory_in_bytes":0,"max_unsafe_auto_id_timestamp":9223372036854775807,"file_sizes":{}}},"nodes":{"count":{"total":5,"data":4,"coordinating_only":0,"master":4,"ingest":5},"versions":["5.0.2"],"os":{"available_processors":40,"allocated_processors":40,"names":[{"name":"Linux","count":5}],"mem":{"total_in_bytes":167569305600,"free_in_bytes":13970579456,"used_in_bytes":153598726144,"free_percent":8,"used_percent":92}},"process":{"cpu":{"percent":34},"open_file_descriptors":{"min":381,"max":20990,"avg":10838}},"jvm":{"max_uptime_in_millis":517091067,"versions":[{"version":"1.8.0_144","vm_name":"OpenJDK 64-Bit Server VM","vm_version":"25.144-b01","vm_vendor":"Oracle Corporation","count":1},{"version":"1.8.0_60","vm_name":"Java HotSpot(TM) 64-Bit Server VM","vm_version":"25.60-b23","vm_vendor":"Oracle Corporation","count":1},{"version":"1.8.0_131","vm_name":"Java HotSpot(TM) 64-Bit Server VM","vm_version":"25.131-b11","vm_vendor":"Oracle Corporation","count":3}],"mem":{"heap_used_in_bytes":37643995048,"heap_max_in_bytes":80181985280},"threads":545},"fs":{"total_in_bytes":4789270749184,"free_in_bytes":4268123291648,"available_in_bytes":4133855657984},"plugins":[],"network_types":{"transport_types":{"netty4":5},"http_types":{"netty4":5}}}}

Error Logs:

[2017-10-22T00:44:24,578][DEBUG][o.e.a.b.TransportShardBulkAction] [es-dev-data-02] [health-logs-2017-10-22][2] failed to execute bulk item (index) index {[health-logs-2017-10-22][logs][AV9CZWzzgaAYExRHkFSf], source[{"@timestamp":"2017-10-13T18:12:23.750Z","@version":1,"logger_name":"com.tbs.healthLogger","thread_name":"http-nio-8080-exec-3","level":"INFO","level_value":20000,"HOSTNAME":"ivr-rest","LOG_LEVEL_PATTERN":"%5p","application_id":"devivr3-aws-ivr-rest","host":".*.***.***","port":31523,"status":"UP","metrics":{"status":"UP","mem":344859,"mem_free":119655,"processors":4,"instance_uptime":32477781,"uptime":32726069,"systemload_average":0.07,"heap_committed":250368,"heap_init":65536,"heap_used":130712,"heap":466432,"nonheap_committed":96512,"nonheap_init":2496,"nonheap_used":94491,"nonheap":0,"threads_peak":27,"threads_daemon":22,"threads_totalStarted":853,"threads":25,"classes":10281,"classes_loaded":10364,"classes_unloaded":83,"gc_ps_scavenge_count":49,"gc_ps_scavenge_time":712,"gc_ps_marksweep_count":3,"gc_ps_marksweep_time":602},"diskSpace":{"status":"UP","total":10434699264,"free":10061524992,"threshold":10485760},"db":{"status":"UP","database":"Oracle","hello":"Hello"},"refreshScope":{"status":"UP"}}]}
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [db.hello]
at org.elasticsearch.index.mapper.FieldMapper.parse(FieldMapper.java:297) ~[elasticsearch-5.0.2.jar:5.0.2]
at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:438) ~

	[2017-10-22T00:30:57,033][WARN ][o.e.a.a.c.n.i.TransportNodesInfoAction] [es-dev-client-01] not accumulating exceptions, excluding exception from response

org.elasticsearch.action.FailedNodeException: Failed node [U_kWZ-DhQdOWeFCKnbjzTQ]
at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.onFailure(TransportNodesAction.java:247) ~[elasticsearch-5.0.2.jar:5.0.2]
at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.access$300(TransportNodesAction.java:160) ~[elasticsearch-5.0.2.jar:5.0.2]
at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1.handleException(TransportNodesAction.java:219) ~[elasticsearch-5.0.2.jar:5.0.2]
at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:957) ~[elasticsearch-5.0.2.jar:5.0.2]
at org.elasticsearch.transport.TransportService$5.doRun(TransportService.java:525) ~[elasticsearch-5.0.2.jar:5.0.2]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:527) ~[elasticsearch-5.0.2.jar:5.0.2]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-5.0.2.jar:5.0.2]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_131]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_131]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]

[2017-10-21T02:02:38,912][INFO ][o.e.c.s.ClusterService ] [es-dev-data-01] added {{es-dev-data-02}{U_kWZ-DhQdOWeFCKnbjzTQ}{293Bk1aBSxqbBQiHP640cw}{.*.***.*}{.***.***.*:9300},}, reason: zen-disco-receive(from master [master {es-dev-data-04}{I3b0900GRm-K_Ix7r9ge8g}{o6sm18Y7QFCv-7ZcnioDVw}{.***.***.*}{.***.***.*:9300} committed version [3052091]])
[2017-10-21T04:11:45,449][INFO ][o.e.c.s.ClusterService ] [es-dev-data-01] removed {{es-dev-data-02}{U_kWZ-DhQdOWeFCKnbjzTQ}{293Bk1aBSxqbBQiHP640cw}{.***.***.*}{.***.***.*:9300},}, reason: zen-disco-receive(from master [master {es-dev-data-04}{I3b0900GRm-K_Ix7r9ge8g}{o6sm18Y7QFCv-7ZcnioDVw}{.***.***.*}{.***.***.*:9300} committed version [3052114]])
[2017-10-21T04:13:32,860][INFO ][o.e.c.s.ClusterService ] [es-dev-data-01] added {{es-dev-data-02}{U_kWZ-DhQdOWeFCKnbjzTQ}{293Bk1aBSxqbBQiHP640cw}{.***.***.*}{.***.***.*:9300},}, reason: zen-disco-receive(from master [master {es-dev-data-04}{I3b0900GRm-K_Ix7r9ge8g}{o6sm18Y7QFCv-7ZcnioDVw}{.***.***.*}{.***.***.***:9300} committed version [3052138]])

Christian_Dahlqvist · October 22, 2017, 8:11am

Although it may not be directly related to the error you are seeing, you seem to have an excessively large amount of small indices and shards in your cluster. Please read this blog post on shards and sharding for further information.

Can you provide a larger block of logs containing the error you are seeing, e.g. through a gist?

I would also recommend upgrading to the latest version as 5.0.2 is getting a bit old now and a lot of issues have been fixed since then.

system · November 19, 2017, 8:12am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Shard timeout problem on AWS Elasticsearch	8	447	July 6, 2017
Collector [cluster_stats] timed out when collecting data: node Elasticsearch	4	1052	December 27, 2022
ElasticSearch with > 40 nodes, missing shards and indexing troubles Elasticsearch	11	678	July 6, 2017
Elasticsearch Cluster Timeouts Elasticsearch	14	2861	September 14, 2018
ElasticSearch : observer: timeout notification from cluster service Elasticsearch	10	11243	July 5, 2017

ELASTICSEARCH_17 - Could not index '862' records: java.io.IOException: listener timeout after waiting for [30000] ms

Related topics