ELASTICSEARCH_17 - Could not index '862' records: java.io.IOException: listener timeout after waiting for [30000] ms


(Rakesh Dudi) #1

Hi,

We are having multiple issues with elastic search cluster.

  1. ELASTICSEARCH_17 - Could not index '862' records: java.io.IOException: listener timeout after waiting for [30000] ms
    Continuously getting lot of above errors..
  2. sometimes ES got stuck with primary shards allocation and never turn to green.
  3. shards are not getting balanced between the nodes.

Here are my cluster settings:

4 data nodes and 1 client node.
discovery.zen.minimum_master_nodes: 3
thread_pool.bulk.queue_size: 500
http.cors.enabled: true
http.cors.allow-origin: "*"
http.cors.allow-headers: Authorization

Thank you.

Regards,
Rakesh


(Christian Dahlqvist) #2

What type of storage are you using in your cluster? What version are you on?


(Rakesh Dudi) #3

Version: 5.0.2
Type of storage: AWS ec2 file system (disk based).


(Christian Dahlqvist) #4

Are you using EBS?


(Rakesh Dudi) #5

yes. that's correct. We should use root volume only ?
EBS has been attached as an additional volume.


(Rakesh Dudi) #6

And also to add, primary shards got placed in nodes 1 & 3 out of 4 noeds. somehow shards are not evenly distributed. I am not sure whether this is a problem or not. I really need some help here...


(Rakesh Dudi) #7


(Christian Dahlqvist) #8

Exactly what type of EBS volume are you using? What does iostat look like?


(Rakesh Dudi) #9

Volume Type: gp2
IOPS: 3000


(Christian Dahlqvist) #10

That sounds good. Not sure what is going on. Can you share the exact log messages and what is around them? Can you also share the full output from the cluster stats API?


(Rakesh Dudi) #11

Hi, IP's has been masked. Here is the output of cluster stat API and error logs.

Cluster Stats API

{"_nodes":{"total":5,"successful":5,"failed":0},"cluster_name":"esawstest","timestamp":1508646359193,"status":"yellow","indices":{"count":2128,"shards":{"total":21259,"primaries":10629,"replication":1.0000940822278672,"index":{"shards":{"min":2,"max":10,"avg":9.990131578947368},"primaries":{"min":1,"max":5,"avg":4.994830827067669},"replication":{"min":0.4,"max":2.0,"avg":1.0003054511278195}}},"docs":{"count":819350982,"deleted":8661},"store":{"size_in_bytes":443402059078,"throttle_time_in_millis":0},"fielddata":{"memory_size_in_bytes":64544,"evictions":0},"query_cache":{"memory_size_in_bytes":621496,"total_count":11102357,"hit_count":11094177,"miss_count":8180,"cache_size":2507,"cache_count":2507,"evictions":0},"completion":{"size_in_bytes":0},"segments":{"count":108644,"memory_in_bytes":2904752619,"terms_memory_in_bytes":2323258512,"stored_fields_memory_in_bytes":165932392,"term_vectors_memory_in_bytes":0,"norms_memory_in_bytes":120012224,"points_memory_in_bytes":129830619,"doc_values_memory_in_bytes":165718872,"index_writer_memory_in_bytes":51113132,"version_map_memory_in_bytes":239984,"fixed_bit_set_memory_in_bytes":0,"max_unsafe_auto_id_timestamp":9223372036854775807,"file_sizes":{}}},"nodes":{"count":{"total":5,"data":4,"coordinating_only":0,"master":4,"ingest":5},"versions":["5.0.2"],"os":{"available_processors":40,"allocated_processors":40,"names":[{"name":"Linux","count":5}],"mem":{"total_in_bytes":167569305600,"free_in_bytes":13970579456,"used_in_bytes":153598726144,"free_percent":8,"used_percent":92}},"process":{"cpu":{"percent":34},"open_file_descriptors":{"min":381,"max":20990,"avg":10838}},"jvm":{"max_uptime_in_millis":517091067,"versions":[{"version":"1.8.0_144","vm_name":"OpenJDK 64-Bit Server VM","vm_version":"25.144-b01","vm_vendor":"Oracle Corporation","count":1},{"version":"1.8.0_60","vm_name":"Java HotSpot(TM) 64-Bit Server VM","vm_version":"25.60-b23","vm_vendor":"Oracle Corporation","count":1},{"version":"1.8.0_131","vm_name":"Java HotSpot(TM) 64-Bit Server VM","vm_version":"25.131-b11","vm_vendor":"Oracle Corporation","count":3}],"mem":{"heap_used_in_bytes":37643995048,"heap_max_in_bytes":80181985280},"threads":545},"fs":{"total_in_bytes":4789270749184,"free_in_bytes":4268123291648,"available_in_bytes":4133855657984},"plugins":[],"network_types":{"transport_types":{"netty4":5},"http_types":{"netty4":5}}}}

Error Logs:

[2017-10-22T00:44:24,578][DEBUG][o.e.a.b.TransportShardBulkAction] [es-dev-data-02] [health-logs-2017-10-22][2] failed to execute bulk item (index) index {[health-logs-2017-10-22][logs][AV9CZWzzgaAYExRHkFSf], source[{"@timestamp":"2017-10-13T18:12:23.750Z","@version":1,"logger_name":"com.tbs.healthLogger","thread_name":"http-nio-8080-exec-3","level":"INFO","level_value":20000,"HOSTNAME":"ivr-rest","LOG_LEVEL_PATTERN":"%5p","application_id":"devivr3-aws-ivr-rest","host":".*.***.***","port":31523,"status":"UP","metrics":{"status":"UP","mem":344859,"mem_free":119655,"processors":4,"instance_uptime":32477781,"uptime":32726069,"systemload_average":0.07,"heap_committed":250368,"heap_init":65536,"heap_used":130712,"heap":466432,"nonheap_committed":96512,"nonheap_init":2496,"nonheap_used":94491,"nonheap":0,"threads_peak":27,"threads_daemon":22,"threads_totalStarted":853,"threads":25,"classes":10281,"classes_loaded":10364,"classes_unloaded":83,"gc_ps_scavenge_count":49,"gc_ps_scavenge_time":712,"gc_ps_marksweep_count":3,"gc_ps_marksweep_time":602},"diskSpace":{"status":"UP","total":10434699264,"free":10061524992,"threshold":10485760},"db":{"status":"UP","database":"Oracle","hello":"Hello"},"refreshScope":{"status":"UP"}}]}
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [db.hello]
at org.elasticsearch.index.mapper.FieldMapper.parse(FieldMapper.java:297) ~[elasticsearch-5.0.2.jar:5.0.2]
at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:438) ~

	[2017-10-22T00:30:57,033][WARN ][o.e.a.a.c.n.i.TransportNodesInfoAction] [es-dev-client-01] not accumulating exceptions, excluding exception from response

org.elasticsearch.action.FailedNodeException: Failed node [U_kWZ-DhQdOWeFCKnbjzTQ]
at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.onFailure(TransportNodesAction.java:247) ~[elasticsearch-5.0.2.jar:5.0.2]
at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.access$300(TransportNodesAction.java:160) ~[elasticsearch-5.0.2.jar:5.0.2]
at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1.handleException(TransportNodesAction.java:219) ~[elasticsearch-5.0.2.jar:5.0.2]
at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:957) ~[elasticsearch-5.0.2.jar:5.0.2]
at org.elasticsearch.transport.TransportService$5.doRun(TransportService.java:525) ~[elasticsearch-5.0.2.jar:5.0.2]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:527) ~[elasticsearch-5.0.2.jar:5.0.2]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-5.0.2.jar:5.0.2]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_131]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_131]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]

[2017-10-21T02:02:38,912][INFO ][o.e.c.s.ClusterService ] [es-dev-data-01] added {{es-dev-data-02}{U_kWZ-DhQdOWeFCKnbjzTQ}{293Bk1aBSxqbBQiHP640cw}{.*.***.*}{.***.***.*:9300},}, reason: zen-disco-receive(from master [master {es-dev-data-04}{I3b0900GRm-K_Ix7r9ge8g}{o6sm18Y7QFCv-7ZcnioDVw}{.***.***.*}{.***.***.*:9300} committed version [3052091]])
[2017-10-21T04:11:45,449][INFO ][o.e.c.s.ClusterService ] [es-dev-data-01] removed {{es-dev-data-02}{U_kWZ-DhQdOWeFCKnbjzTQ}{293Bk1aBSxqbBQiHP640cw}{
.***.***.*}{.***.***.*:9300},}, reason: zen-disco-receive(from master [master {es-dev-data-04}{I3b0900GRm-K_Ix7r9ge8g}{o6sm18Y7QFCv-7ZcnioDVw}{.***.***.*}{.***.***.*:9300} committed version [3052114]])
[2017-10-21T04:13:32,860][INFO ][o.e.c.s.ClusterService ] [es-dev-data-01] added {{es-dev-data-02}{U_kWZ-DhQdOWeFCKnbjzTQ}{293Bk1aBSxqbBQiHP640cw}{
.***.***.*}{.***.***.*:9300},}, reason: zen-disco-receive(from master [master {es-dev-data-04}{I3b0900GRm-K_Ix7r9ge8g}{o6sm18Y7QFCv-7ZcnioDVw}{.***.***.*}{.***.***.***:9300} committed version [3052138]])


(Christian Dahlqvist) #12

Although it may not be directly related to the error you are seeing, you seem to have an excessively large amount of small indices and shards in your cluster. Please read this blog post on shards and sharding for further information.

Can you provide a larger block of logs containing the error you are seeing, e.g. through a gist?

I would also recommend upgrading to the latest version as 5.0.2 is getting a bit old now and a lot of issues have been fixed since then.


(system) #13

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.