"No space left of device" mitigation plan?


(Eugene Pirogov) #1

I have Elasticsearch cluster with two nodes: a two-master setup. It is used to store logs.

Suddenly one of the masters went down, and refuses to go up. The logs reveal the following error:

Suppressed: java.io.IOException: No space left on device

I tried listing index names (and maybe deleting some indexes?) and interacting with the node that is still live alive, e.g. second master. But it refuses to respond, with the following error:

{
  "error": {
    "root_cause": [
      {
        "type": "master_not_discovered_exception",
        "reason": null
      }
    ],
    "type": "master_not_discovered_exception",
    "reason": null
  },
  "status": 503
}

I'm perfectly comfortable with dropping some older data, e.g. some older indexes. I have access to the disk, and can try & wipe some older data from the /var/lib/elasticsearch folder (if I'm not mistaken), but I'm not sure how exactly elasticsearch organizes the data. It's very likely that I will corrupt the data.

What are the next steps to mitigate this?

Is there an offline CLI tool that'd be able to remove some older indexes? If such tool existed, I could use it to wipe older indexes and restart must Elasticsearch cluster.

As a last resort, I of course can stop the Elasticsearch, increase the disks, and run Elasticsearch again.


(Adrien Grand) #2

Is there some non-elasticsearch data that you could remove to make room before restarting this master node, maybe old logs? I am surprised that you ran into this issue, disk watermarks should have prevented it. Or did you disable them?


(Eugene Pirogov) #3

Hi @jpountz!

To be honest I wasn't paying enough attention to either the disk status or elasticsearch logs, until the issue happen. Was the "disk watermarks" mechanism supposed to somehow indicate that the problem is coming? If yes – how?

Regarding whether or not disk watermarks are turned on, I can't say reliably. I'm using a docker image, gcr.io/google-containers/elasticsearch:v5.6.2.

Here's the source code for the image with the specific version of elasticsearch I'm using:

After quickly glancing over the image source code, I see no indication that the default setting for disk watermarks are being altered.


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.