ES 2.0.1 - Getting huge Translogs size which is not getting cleared up in one of default five shards. Which finally causes Out Of Memory -All Shard Failure

We have default configurations in YML file , Among five shards ,One of the Shard is having trans logs size around 148 GB which is filling up the disk space now.

To be noted our indexing operations are write heavy.

Tried externally force Flush operations (rerun flush API) , still the trans logs didn't cleared up. No changes in trans logs size. It seems default Flush is not working and external flush is also not taking effects.

Now getting all shards failure - Out of memory issue due to uncleared large amount of trans log size.

Please help on this -

It seems one of the trans logs get corrupted . While trying to recover the index data got below exception-

cluster.service ] [ise-rcrt2] processing [shard-failed ([ise][4], node[P_KdYBMqQjaRJFZ7tlYLng], [P], v[57], s[INITIALIZING], a[id=Iy9mqlhfRZutKkwXYcHKbA], unassigned_info[[reason=ALLOCATION_FAIL
ED], at[2017-07-18T18:31:41.806Z], details[failed recovery, failure IndexShardRecoveryException[failed to recovery from gateway]; nested: EngineCreationFailureException[failed to recover from translog]; nested: EngineException[failed to
recover from translog]; nested: ElasticsearchException[unexpected exception reading from translog snapshot of /opt/CSCOcpm/elasticsearch/data/ise-elasticsearch/nodes/0/indices/ise/4/translog/translog-211.tlog]; nested: EOFException[rea
d past EOF. pos [46722580] length: [4] end: [46722580]]; ]]), message [failed recovery]]: took 40ms done applying updated cluster_state (version: 212, uuid: UUlkU70lQcWYwkYTC1MAiA)

Below is the shards details after running the cat command for ISE index-
ise 2 p STARTED 5084471 684.3mb 10.201.230.102 ise-rcrt2
ise 2 r UNASSIGNED
ise 1 p STARTED 5084818 738.2mb 10.201.230.102 ise-rcrt2
ise 1 r UNASSIGNED
ise 3 p STARTED 5083592 797.7mb 10.201.230.102 ise-rcrt2
ise 3 r UNASSIGNED
ise 4 p UNASSIGNED
ise 4 r UNASSIGNED
ise 0 p STARTED 5085913 689.6mb 10.201.230.102 ise-rcrt2
ise 0 r UNASSIGNED

Why this trans logs grows so much and how to solve if can someone help.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.