Cluster stopped ingesting, with "failed to obtain in-memory shard lock"

TimWard · November 1, 2018, 12:43pm

My cluster logged some "failed to obtain in-memory shard lock" messages over a period of the night, finishing at around 04:33 this morning, and there is no data in most of the indexes after 04:33, ie it has stopped indexing data. (There's just one, vary sparsely used, index with data in it past that time.)

All shards are showing as STARTED. retry_failed didn't do anything. Restarting each node in the cluster didn't do anything.

What else do I need to look at? How do I get my cluster indexing again?

The reason for a probably in the middle of the night may have been that I was reindexing hundreds of gigabytes of data, and at some point in the process one or two of the nodes might have got short of disk space for relocating shards. There is currently no shortage of disk space on any node.

Data should be but isn't coming in from Logstash, from Metricbeat and from some Python scripts. The index that is still being written to comes from a Java application.

TimWard · November 1, 2018, 12:45pm

Ah, and then I find the following in the Python logs. I'd better now try to find out what the concept of a "read only index" is.

{
	u'update': {
		u'status': 403,
		u'_type': u'doc',
		u'_index': u'event-2018.07',
		u'error': {
			u'reason': u'blockedby: [FORBIDDEN/12/indexread-only/allowdelete(api)];',
			u'type': u'cluster_block_exception'
		},
		u'_id': u'et3-tim-2.imagiro.ltd_threshold_Datapushgroupcount_2018-07-02T13: 18: 02.265Z',
		u'data': {
			'doc': {
				'alerted-level': 'critical',
				'alerted': True
			}
		}
	}
}

DavidTurner · November 1, 2018, 12:51pm

This would explain the message blockedby: [FORBIDDEN/12/indexread-only/allowdelete(api)]: if a node exceeds the flood-stage disk watermark (95% of disk capacity by default) then all indices with shards on that node are marked as read-only. The documentation on disk-based shard allocation describes this in more detail and also describes how to recover.

This may or may not be related to the "failed to obtain in-memory shard lock" message, but it sounds like this is your actual problem here.

TimWard · November 1, 2018, 12:59pm

And having got that clue from the Python application logs the fix (after several hours' research) was

PUT /_all/_settings
{
  "index": {
    "blocks.read_only_allow_delete": null
  }
}

system · November 29, 2018, 12:59pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
[FORBIDDEN/12/index read-only Elasticsearch ilm-index-lifecycle-management	8	2259	April 9, 2019
"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"}) Elasticsearch	5	53378	January 28, 2019
Elasticsearch Error {type:cluster_block_exception,reason:"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];} Elasticsearch	5	4516	November 21, 2019
java.io.IOException: failed to obtain in-memory shard lock Elasticsearch	15	6286	October 22, 2018
Read only indices FORBIDDEN Elasticsearch	3	420	December 28, 2018

Cluster stopped ingesting, with "failed to obtain in-memory shard lock"

Related topics