ELASTICSEARCH_17 - Could not index '862' records: java.io.IOException: listener timeout after waiting for [30000] ms

Hi,

We are having multiple issues with elastic search cluster.

  1. ELASTICSEARCH_17 - Could not index '862' records: java.io.IOException: listener timeout after waiting for [30000] ms
    Continuously getting lot of above errors..
  2. sometimes ES got stuck with primary shards allocation and never turn to green.
  3. shards are not getting balanced between the nodes.

Here are my cluster settings:

4 data nodes and 1 client node.
discovery.zen.minimum_master_nodes: 3
thread_pool.bulk.queue_size: 500
http.cors.enabled: true
http.cors.allow-origin: "*"
http.cors.allow-headers: Authorization

Thank you.

Regards,
Rakesh

What type of storage are you using in your cluster? What version are you on?

Version: 5.0.2
Type of storage: AWS ec2 file system (disk based).

Are you using EBS?

yes. that's correct. We should use root volume only ?
EBS has been attached as an additional volume.

And also to add, primary shards got placed in nodes 1 & 3 out of 4 noeds. somehow shards are not evenly distributed. I am not sure whether this is a problem or not. I really need some help here...

Exactly what type of EBS volume are you using? What does iostat look like?

Volume Type: gp2
IOPS: 3000

That sounds good. Not sure what is going on. Can you share the exact log messages and what is around them? Can you also share the full output from the cluster stats API?

Hi, IP's has been masked. Here is the output of cluster stat API and error logs.

Cluster Stats API

{"_nodes":{"total":5,"successful":5,"failed":0},"cluster_name":"esawstest","timestamp":1508646359193,"status":"yellow","indices":{"count":2128,"shards":{"total":21259,"primaries":10629,"replication":1.0000940822278672,"index":{"shards":{"min":2,"max":10,"avg":9.990131578947368},"primaries":{"min":1,"max":5,"avg":4.994830827067669},"replication":{"min":0.4,"max":2.0,"avg":1.0003054511278195}}},"docs":{"count":819350982,"deleted":8661},"store":{"size_in_bytes":443402059078,"throttle_time_in_millis":0},"fielddata":{"memory_size_in_bytes":64544,"evictions":0},"query_cache":{"memory_size_in_bytes":621496,"total_count":11102357,"hit_count":11094177,"miss_count":8180,"cache_size":2507,"cache_count":2507,"evictions":0},"completion":{"size_in_bytes":0},"segments":{"count":108644,"memory_in_bytes":2904752619,"terms_memory_in_bytes":2323258512,"stored_fields_memory_in_bytes":165932392,"term_vectors_memory_in_bytes":0,"norms_memory_in_bytes":120012224,"points_memory_in_bytes":129830619,"doc_values_memory_in_bytes":165718872,"index_writer_memory_in_bytes":51113132,"version_map_memory_in_bytes":239984,"fixed_bit_set_memory_in_bytes":0,"max_unsafe_auto_id_timestamp":9223372036854775807,"file_sizes":{}}},"nodes":{"count":{"total":5,"data":4,"coordinating_only":0,"master":4,"ingest":5},"versions":["5.0.2"],"os":{"available_processors":40,"allocated_processors":40,"names":[{"name":"Linux","count":5}],"mem":{"total_in_bytes":167569305600,"free_in_bytes":13970579456,"used_in_bytes":153598726144,"free_percent":8,"used_percent":92}},"process":{"cpu":{"percent":34},"open_file_descriptors":{"min":381,"max":20990,"avg":10838}},"jvm":{"max_uptime_in_millis":517091067,"versions":[{"version":"1.8.0_144","vm_name":"OpenJDK 64-Bit Server VM","vm_version":"25.144-b01","vm_vendor":"Oracle Corporation","count":1},{"version":"1.8.0_60","vm_name":"Java HotSpot(TM) 64-Bit Server VM","vm_version":"25.60-b23","vm_vendor":"Oracle Corporation","count":1},{"version":"1.8.0_131","vm_name":"Java HotSpot(TM) 64-Bit Server VM","vm_version":"25.131-b11","vm_vendor":"Oracle Corporation","count":3}],"mem":{"heap_used_in_bytes":37643995048,"heap_max_in_bytes":80181985280},"threads":545},"fs":{"total_in_bytes":4789270749184,"free_in_bytes":4268123291648,"available_in_bytes":4133855657984},"plugins":[],"network_types":{"transport_types":{"netty4":5},"http_types":{"netty4":5}}}}

Error Logs:

[2017-10-22T00:44:24,578][DEBUG][o.e.a.b.TransportShardBulkAction] [es-dev-data-02] [health-logs-2017-10-22][2] failed to execute bulk item (index) index {[health-logs-2017-10-22][logs][AV9CZWzzgaAYExRHkFSf], source[{"@timestamp":"2017-10-13T18:12:23.750Z","@version":1,"logger_name":"com.tbs.healthLogger","thread_name":"http-nio-8080-exec-3","level":"INFO","level_value":20000,"HOSTNAME":"ivr-rest","LOG_LEVEL_PATTERN":"%5p","application_id":"devivr3-aws-ivr-rest","host":".*.***.***","port":31523,"status":"UP","metrics":{"status":"UP","mem":344859,"mem_free":119655,"processors":4,"instance_uptime":32477781,"uptime":32726069,"systemload_average":0.07,"heap_committed":250368,"heap_init":65536,"heap_used":130712,"heap":466432,"nonheap_committed":96512,"nonheap_init":2496,"nonheap_used":94491,"nonheap":0,"threads_peak":27,"threads_daemon":22,"threads_totalStarted":853,"threads":25,"classes":10281,"classes_loaded":10364,"classes_unloaded":83,"gc_ps_scavenge_count":49,"gc_ps_scavenge_time":712,"gc_ps_marksweep_count":3,"gc_ps_marksweep_time":602},"diskSpace":{"status":"UP","total":10434699264,"free":10061524992,"threshold":10485760},"db":{"status":"UP","database":"Oracle","hello":"Hello"},"refreshScope":{"status":"UP"}}]}
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [db.hello]
at org.elasticsearch.index.mapper.FieldMapper.parse(FieldMapper.java:297) ~[elasticsearch-5.0.2.jar:5.0.2]
at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:438) ~

	[2017-10-22T00:30:57,033][WARN ][o.e.a.a.c.n.i.TransportNodesInfoAction] [es-dev-client-01] not accumulating exceptions, excluding exception from response

org.elasticsearch.action.FailedNodeException: Failed node [U_kWZ-DhQdOWeFCKnbjzTQ]
at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.onFailure(TransportNodesAction.java:247) ~[elasticsearch-5.0.2.jar:5.0.2]
at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.access$300(TransportNodesAction.java:160) ~[elasticsearch-5.0.2.jar:5.0.2]
at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1.handleException(TransportNodesAction.java:219) ~[elasticsearch-5.0.2.jar:5.0.2]
at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:957) ~[elasticsearch-5.0.2.jar:5.0.2]
at org.elasticsearch.transport.TransportService$5.doRun(TransportService.java:525) ~[elasticsearch-5.0.2.jar:5.0.2]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:527) ~[elasticsearch-5.0.2.jar:5.0.2]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-5.0.2.jar:5.0.2]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_131]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_131]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]

[2017-10-21T02:02:38,912][INFO ][o.e.c.s.ClusterService ] [es-dev-data-01] added {{es-dev-data-02}{U_kWZ-DhQdOWeFCKnbjzTQ}{293Bk1aBSxqbBQiHP640cw}{.*.***.*}{.***.***.*:9300},}, reason: zen-disco-receive(from master [master {es-dev-data-04}{I3b0900GRm-K_Ix7r9ge8g}{o6sm18Y7QFCv-7ZcnioDVw}{.***.***.*}{.***.***.*:9300} committed version [3052091]])
[2017-10-21T04:11:45,449][INFO ][o.e.c.s.ClusterService ] [es-dev-data-01] removed {{es-dev-data-02}{U_kWZ-DhQdOWeFCKnbjzTQ}{293Bk1aBSxqbBQiHP640cw}{
.***.***.*}{.***.***.*:9300},}, reason: zen-disco-receive(from master [master {es-dev-data-04}{I3b0900GRm-K_Ix7r9ge8g}{o6sm18Y7QFCv-7ZcnioDVw}{.***.***.*}{.***.***.*:9300} committed version [3052114]])
[2017-10-21T04:13:32,860][INFO ][o.e.c.s.ClusterService ] [es-dev-data-01] added {{es-dev-data-02}{U_kWZ-DhQdOWeFCKnbjzTQ}{293Bk1aBSxqbBQiHP640cw}{
.***.***.*}{.***.***.*:9300},}, reason: zen-disco-receive(from master [master {es-dev-data-04}{I3b0900GRm-K_Ix7r9ge8g}{o6sm18Y7QFCv-7ZcnioDVw}{.***.***.*}{.***.***.***:9300} committed version [3052138]])

Although it may not be directly related to the error you are seeing, you seem to have an excessively large amount of small indices and shards in your cluster. Please read this blog post on shards and sharding for further information.

Can you provide a larger block of logs containing the error you are seeing, e.g. through a gist?

I would also recommend upgrading to the latest version as 5.0.2 is getting a bit old now and a lot of issues have been fixed since then.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.