An index that inflates

Hello,

I have an index that inflates more until the node crashes with a full disk error.
For now, I delete the index directly from the VM because Elastic is unhealthy when it happens.
It is helpful for a short period, and after some time the index inflates and is out of disk space.

The details:

  • When I run the count command the response is 3,571,753 documents:
GET clients/_count
  • When I run the stats command the response is:
 "primaries" : {
      "docs" : {
        "count" : 11622360,
        "deleted" : 4789408
      }
  • When I check the store.size of this index is 25 GB.

  • After I deleted the index last week, the deleted documents were 55,000,000.

  • I have 128 GB of space, which is my most extensive index, so I guess there should be enough space.

My questions:

  1. Why are the deleted documents continue to grow without stopping?
  2. Is makes sense that 3.5M documents will be with 55M deleted documents?
  3. What can I do to prevent the index from inflating?

Thanks in advance for the answer,
Boaz.

You will need to provide a lot more information about this index.

How are you ingesting data to it? Are you using custom document ids or are you letting elastic choose the document id?

Elastic itself will not delete anything, so if the number of deleted documents on an index is rising, it is something that you are doing.

Also, if you are constantly updating your index, then yes, the number of deleted documents can be very huge.

Hello @leandrojmp ,
Thank you for your response.

More details:

  • I ingest the data in 2 ways:
    First, from my Nest system with the client connection.
    Second, with auto replication from another system, with bulks.

  • I use in custom id. I don't give Elastic to choose the ids.

  • Yes, I constantly update my index. There is any way to delete the deleted indexes?

  • The index custom settings:

{
  "index.blocks.read_only_allow_delete": "false",
  "index.priority": "1",
  "index.query.default_field": [
    "*"
  ],
  "index.refresh_interval": "1s",
  "index.write.wait_for_active_shards": "1",
  "index.routing.allocation.include._tier_preference": "data_content",
  "index.mapping.nested_objects.limit": "20000",
  "index.blocks.write": "false",
  "index.number_of_replicas": "1"
}

I added the two following settings last week, I still don't know if is helpful:

  "index.routing.allocation.include._tier_preference": "data_content",
  "index.mapping.nested_objects.limit": "20000",

If you have more details that help to understand the case I will add them.

Thank you!

Those two things would explain the index growing and the number of deleted documents.

When you update a document on an index, elasticsearch will delete this document, create a new one and mark the old document as deleted on the segment where it is stored, the documents marked as deleted will take up disk space for a period of time.

Elasticearch periodically merge smaller segments into large ones, and when this merge happens the documents marked as deleted are removed and the disk space used by them is freed.

You do not have control when Elasticsearch will merge segments, there is a force merge API to force elasticsearch to merge the segments of an index but it is not recommended to use this API on indices that are still being written.

I'm not sure that there is much you can do besides revising your update strategy.

How frequently do you update the documents?

Hello @leandrojmp, thank you.
We clone some data of a RDBS to Elasticsearch every 20 minutes and update some data when a user is modified in the Frontend app. And worst of all we have a heavy logstash pipeline that updates some nested documents.
So I understand we haven't something to do?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.