"No space left of device" mitigation plan?

eugene_pirogov · January 23, 2018, 10:28am

I have Elasticsearch cluster with two nodes: a two-master setup. It is used to store logs.

Suddenly one of the masters went down, and refuses to go up. The logs reveal the following error:

Suppressed: java.io.IOException: No space left on device

I tried listing index names (and maybe deleting some indexes?) and interacting with the node that is still live alive, e.g. second master. But it refuses to respond, with the following error:

{
  "error": {
    "root_cause": [
      {
        "type": "master_not_discovered_exception",
        "reason": null
      }
    ],
    "type": "master_not_discovered_exception",
    "reason": null
  },
  "status": 503
}

I'm perfectly comfortable with dropping some older data, e.g. some older indexes. I have access to the disk, and can try & wipe some older data from the /var/lib/elasticsearch folder (if I'm not mistaken), but I'm not sure how exactly elasticsearch organizes the data. It's very likely that I will corrupt the data.

What are the next steps to mitigate this?

Is there an offline CLI tool that'd be able to remove some older indexes? If such tool existed, I could use it to wipe older indexes and restart must Elasticsearch cluster.

As a last resort, I of course can stop the Elasticsearch, increase the disks, and run Elasticsearch again.

jpountz · January 23, 2018, 6:05pm

Is there some non-elasticsearch data that you could remove to make room before restarting this master node, maybe old logs? I am surprised that you ran into this issue, disk watermarks should have prevented it. Or did you disable them?

eugene_pirogov · January 23, 2018, 8:47pm

Hi @jpountz!

To be honest I wasn't paying enough attention to either the disk status or elasticsearch logs, until the issue happen. Was the "disk watermarks" mechanism supposed to somehow indicate that the problem is coming? If yes – how?

Regarding whether or not disk watermarks are turned on, I can't say reliably. I'm using a docker image, gcr.io/google-containers/elasticsearch:v5.6.2.

Here's the source code for the image with the specific version of elasticsearch I'm using:

After quickly glancing over the image source code, I see no indication that the default setting for disk watermarks are being altered.

system · February 20, 2018, 8:47pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Index fail due to java.io.IOException: No space left on device, but we do have free space in disk Elasticsearch	4	3490	July 5, 2017
No space left on device Elasticsearch	3	12505	October 10, 2020
No space left on device when elasticsearch 2.1.1stopped Elasticsearch	2	4601	July 5, 2017
No space left on device and then "FileAlreadyExistsException" Elasticsearch	2	2231	July 5, 2017
Can't start Elasticsearch with Full disk Space Elasticsearch	3	1953	October 14, 2020

"No space left of device" mitigation plan?

Related topics