ILM not reclaiming back space in ELK Host server

We have a ELK cluster with single node having 50gb disk space. We are sending metricbeat data from 200 servers to Elasticsearch.
Created an ILM policy for metricbeat index to do a rollover and delete the index after it crosses 5GB.
ILM is doing the index rollover and deletion but still we are not able to get back the disk space back.
Eventually disk percent used is increasing each day and after few days, disk is completely filled up.

Why ILM policy is not deleting the data and reclaiming space?

Could you please help us with this?

Hi @Anagha_nambiar Welcome to the community.

Please show your exact policy.

HI @stephenb Please find the below ILM policies





@stephenb Do we need to update something in metricbeat.yml file regarding this ILM policy?

Please do not share screenshots / images. They are very hard to read, can not even be read by others can not be searched or debugged...

Also I can not tell from all the screenshots which ILM you are having trouble with.

Please post the code in formatted text of the exact ILM policy you are having trouble with.

@stephenb I am having issue with metricbeat and heartbeat indexes.

Those two indexes are rolled over to new index but still the disk space is not reclaimed back.

Please excuse me as I am not getting to copy paste the code directly so I am sending the snap.

Please find the policy details below:

So you can see the indices are actually getting deleted?

How did you confirm that?

When an index is deleted the disk space is almost immediately recovered.

If so then the disk continually rising is probably not from the index data.

Perhaps the disk is being consumed by something else... like the elasticsearch logs or something else.

@stephenb yes, I can see the indexes getting deleted from Index management in Kibana UI.

When you say, it might be because of Elasticsearch logs, how can we automate that deletion? And how can we find and delete the logs/others?

Could you please suggest?

I do not know what is actually taking the space, it could be any number of things.

I can say that when an index is deleted the disk space is reclaimed almost immediately so the source of your rising disk usage is probably something else.

Logging configuration, rotation, deletion etc..etc..

Your Linux admin should able help you find the sources of the directories taking up space.

Thank you @stephenb
I will try to check what you mentioned and clear the space.

@stephenb I got the disk space back when I used the command :
docker system prune --volumes -f
Could you please suggest what might be the reason behind it?

Well That explains it.. you did not say you are running inside docker. :slight_smile:

As you found out Docker manages the Docker container Volume size not Elasticsearch.

Also it is not best use the internal docker container volume for the elasticsearch data. best practice is to mount the elastic data path to a volume on the host.

See Here

Always bind data volumes

You should use a volume bound on /usr/share/elasticsearch/data for the following reasons:

  1. The data of your Elasticsearch node won’t be lost if the container is killed
  2. Elasticsearch is I/O sensitive and the Docker storage driver is not ideal for fast I/O
  3. It allows the use of advanced Docker volume plugins

Example

    volumes:
      - data01:/usr/share/elasticsearch/data

Also look at some good advice here

@stephenb Thanks a lot!
I will definitely look into the options which you provided and will try to implement what you suggested. Hope that will resolve this issue.

For your reference, I am providing my docker compose file below:

@stephenb I believe the docker compose file which I shared is already using a mount for the volume. We basically don't want to store this data at all.
Could you please suggest?

This is because you are using a named volume mount with docker and elastic so elastic data is still within the docker environment and the volume (space) is managed by docker

As opposed a bind mount where the data is external to a docker volume i.e. mounted on the local host.

I think you need to read up on the difference

On the Docker Docs : Manage data in Docker | Docker Documentation

And our Docs here

Here is a nice article showing how...

What I did to test

  1. create and external volume

docker volume create --driver local --opt type=none --opt device=/path/to/data/on/host/data --opt o=bind data

  1. Then my compose
---
version: '3'
services:
  elasticsearch:
    container_name: es01
    image: docker.elastic.co/elasticsearch/elasticsearch:${TAG}
    environment: ['ES_JAVA_OPTS=-Xms2g -Xmx2g','bootstrap.memory_lock=true','discovery.type=single-node', 'xpack.security.enabled=false']
    volumes:
      - data:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
    networks:
      - elastic
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536

  kibana:
    image: docker.elastic.co/kibana/kibana:${TAG}
    container_name: kib01
    environment:
      XPACK_APM_SERVICEMAPENABLED: "true"
      XPACK_ENCRYPTEDSAVEDOBJECTS_ENCRYPTIONKEY: d1a66dfd-c4d3-4a0a-8290-2abcb83ab3aa

    ports:
      - 5601:5601
    networks:
      - elastic

networks:
  elastic:

volumes:
  data:
    external: true
  1. Then when I run the docker-compose the Elasticsearch data is written to
    /path/to/data/on/host/data

  2. Then when I delete / clean up indices that space is reclaimed.

@stephenb Thank you!!
I will try as you mentioned and let you know how it goes.

Hi @stephenb I created the external volume as you suggested earlier. Also updated the docker compose file with the details.

Still I am facing the same disk space issue.

Suppose the host path is : /usr/share/Elasticsearch/data.
If I run the remove command:
rm -rf /usr/share/Elasticsearch/data

Then I am able to delete the data and spin up the containers again. Otherwise ILM even after doing the rollover and delete, it is not reclaiming the disk space back. Please find the below snaps:

Hi @Anagha_nambiar

What was the command run for the first picture?

What command did you run to mount the volume?

I suspect the volume is still not mounted correctly.

Keep working on it ... this is most likely a configuration of a docker issue not elastic issues

@stephenb

  1. Command is :
    Curl -X GET http://localhost:9200/_cat/allocation?v

  2. Command for volume mount is :
    docker volume create --driver local --opt type=none --opt device=/usr/share/Elasticsearch/data --opt o=bind data

  3. In docker-compose file:

version: '3.7'
services:
Elasticsearch:
container_name: Elasticsearch
image: docker.elastic.co/elasticsearch/elasticsearch:${TAG}
environment:
- xpack.secirity.enabled=true
- bootstrap.memory_lock=true
- discovery.type=single-node
volumes:
- data:/usr/share/Elasticsearch/data
ports:
- 9200:9200
networks:
- elastic
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536

Actually I think you are configured correctly but You are still confusing Disk managed by Docker vs Local Host Disk Elasticsearch

It gets a little tricky with Docker and Elasticsearch

Example

I run

curl  "http://localhost:9200/_cat/nodes/?v&h=name,du,dt,dup"
name            du     dt   dup
53609d9b5c5f 7.7gb 58.4gb 13.25

It looks like my indices are using 8.8GB

But then I look at my indices...

curl  "http://localhost:9200/_cat/allocation?v"
shards disk.indices disk.used disk.avail disk.total disk.percent host       ip         node
    21         57mb     7.6gb     50.7gb     58.4gb           13 172.27.0.4 172.27.0.4 53609d9b5c5f
    12                                                                                 UNASSIGNED                                                                               UNASSIGNED

They are tiny 57mb... certainly not 7.6GB... .what is going on ....

Ahhh the 7.6GB disk usage being reported by Elasticsearch Cat Nodes / Allocation reports on the Docker Disk Container Filesystem that Elasticsearch is running in NOT the Host local disk...

So you will never see the Local Host Disk usage from Within Elasticsearch Running inside Docker... It can...not...see... it.

Elasticsearch only see the "local" filesystem which is docker owned... nothing Elasticsearch can do about that.

disk.used disk.avail disk.total disk.percent host       ip         node
7.6gb     50.7gb     58.4gb           13 172.27.0.4 172.27.0.4 53609d9b5c5f

These are all Docker Related...

That 7.6gb GB is what is being by Docker (probably multiple containers etc) not your indices... assuming you have them mounted correctly...

If you want Docker to take up less space reduce the disk it is allowed to consume...