Version 7.1.1 Corrupted translog After a power failure

panxuelin · October 31, 2019, 3:50am

Translog has become corrupted and why would this happen. Can close system file cache fix the Problem Completely?

"unassigned_info": {
"reason": "ALLOCATION_FAILED",
"at": "2019-10-09T07:08:10.690Z",
"failed_attempts": 5,
"delayed": false,
"details": "failed shard on node [aTbLQpDwTL204ASeorrvsA]: shard failure, reason [failed to recover from translog], failure EngineException[failed to recover from translog]; nested: EOFException[read past EOF. pos [1070861] length: [4] end: [1070861]]; ",
"allocation_status": "deciders_no"
}

panxuelin · October 31, 2019, 4:00am

The problem comes almost every time after power off system while inserting data into elasticsearch
But after I try to close system file cache, the problem seems to disappear. Does't it really work? Who can analysis principly?

DavidTurner · October 31, 2019, 5:50am

The usual explanation is that your storage is not working correctly and is acknowledging writes before they have completed. This is a trick that lower-grade storage sometimes uses to improve its performance numbers at the expense of your data.

What do you mean "close system file cache"?

panxuelin · October 31, 2019, 6:31am

Yes, "close system file cache" by using linux command "hparm -W"

panxuelin · October 31, 2019, 6:37am

Do you mean It`s problem of storage Or if something wrong in "writing translog"?
If "close system file cache" can solve the problem.

Christian_Dahlqvist · October 31, 2019, 6:37am

What type of storage are you using? Is it some kind of network attached filesystem?

panxuelin · October 31, 2019, 6:39am

What kind of parameter you mean?

Christian_Dahlqvist · October 31, 2019, 6:39am

What type of hardware is the cluster deployed on?

panxuelin · October 31, 2019, 6:41am

We try HDD and SSD. Same problem.

DavidTurner · October 31, 2019, 6:43am

This disables the write cache and does indeed indicate that your disk is lying to Elasticsearch and acknowledging writes before they have completed.

panxuelin · October 31, 2019, 6:52am

If It means that hparm can solve the problem.Are there something else cause corrupted translog.
where can I find flow path of writing translog.

DavidTurner · October 31, 2019, 7:06am

No, this does not indicate a problem in Elasticsearch or elsewhere. It indicates that your disks have a volatile write cache that loses data on a power loss.

panxuelin · October 31, 2019, 7:10am

Thank you !

system · November 28, 2019, 7:10am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Corrupted translog Elasticsearch	18	8375	June 27, 2017
Translog is corrupted Elasticsearch	3	3511	November 1, 2021
Translog files corrupted, cluster failing to recover Elasticsearch	2	1748	July 5, 2017
Failed shard recovery after hard shutdown Elasticsearch	4	629	January 15, 2019
Failed to recover from translog Elasticsearch	3	2070	July 5, 2017

Version 7.1.1 Corrupted translog After a power failure

Related topics