How to get rid of deleted documents and reclaim disk space?

I have a bunch of indexes with a massive amount of deleted documents.

I've tried running POST /_forcemerge?only_expunge_deletes=true multiple times, but it doesn't seem to do anything. The command returns Gateway Timeout Expired.


{
  "statusCode": 504,
  "error": "Gateway Time-out",
  "message": "Client request timeout"
}

I figured that it continues running after the error, and so I waited over the weekend. However the number of deleted records didn't budge.

What am I missing?

Force merge is the way to go. What version are you on?

That sounds like you have a proxy between Elasticsearch and yourself?

The version is below.

{
  "name" : "rw-es-oc2-d01",
  "cluster_name" : "oc2-elastic-np",
  "cluster_uuid" : "1aMRQBvrRvqFkHNltjksiA",
  "version" : {
    "number" : "6.6.1",
    "build_flavor" : "default",
    "build_type" : "rpm",
    "build_hash" : "1fd8f69",
    "build_date" : "2019-02-13T17:10:04.160291Z",
    "build_snapshot" : false,
    "lucene_version" : "7.6.0",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

There is no proxy between me and elastic. I opened a ticket with support, but I was hoping someone would know the answer off hand.

6.X is very much EOL and you should upgrade ASAP.

Are you running on our Elasticsearch Service?

Is there a known issue with POST /_forcemerge?only_expunge_deletes=true on v6.6?

The upgrade isn't happening anytime soon, since it's out of my control. It's running on prem. Besides, I would imagine, it isn't easy to upgrade a multi-terabyte instance that is backing up a 24/7 operation.

This is not an Elasticsearch error, so you have a proxy or something in place.

I would try talking directly to Elasticsearch.

Hi @rgelb I am pretty sure we have work together before as I was at CoreLogic... :wink:

So besides making sure that you have a direct connection to Elasticsearch, there's a couple things with force merge that are not obvious sometimes.

  1. force merge happens on the node of the primary shard is and in essence there needs to be enough disk space to do it. If There's not enough disk space and I would say think about Let's just say anywhere from additional 50 to 90% of the primary shard size will need to be available on that node... Otherwise it will not be able to complete... I.e It needs room to build the new segments before I can get rid of the old segments.

  2. Second is with that many segments and depending on what the underlying hardware is, it could take quite some time.... Quite some time.

With the cat segments, you'll be able to see if the new segments are being created... Or not.

GET /_cat/segments?v

  1. You should also be able to use the task API believe to see what is running...or not.

And yes, it'll be a significant task to upgrade but it can be done... at Some point assuming this is a business critical app.. We have experts that can help with that.

1 Like

@warkolm That is definitely an ES error. Checkout the headers coming back:

image

Same headers as with a successful call to ES.

@stephenb I never worked at CoreLogic, but I've worked with the company a number of times on various real estate projects. Your name does sound familiar.

The issue with space... There is about 40% of space remaining.
image

And for each node (last one being the master):
image

Assuming that is the case, I should still be able to clean up deleted records from an index that only has a few. For example, index newsarticles_web01_dev_637044237457830355 has 7833 indexed documents and 3 deleted ones.

Surely, I should be able to issue a forcemerge command on this index:
POST newsarticles_web01_dev_637044237457830355/_forcemerge?only_expunge_deletes=true

But it results in a timeout as well and nothing happens.

And yes I was at CoreLogic and you were consultant helping us with the integration with a rental platform.

Have you tried the force merge without the expunge only? Normal force merge will get rid of the deletes.

Have you tried these commands just from the curl on one of the hosts from Elasticsearch?

Has any of these nodes ever gone to flood stage with respect to disk? If so, some of these indices could be in read-only mode.

I'm assuming also this is just a basic license without support. Otherwise if you have support, open a support ticket.

I tried forcemerge without expunge from command line with curl. After a long time (10-20m), it came back with:

Request:
curl -XPOST -u elastic:mypasswordhere "http://rw-es-oc2-d01:9200/newsarticles_web01_dev_637044237457830355/_forcemerge"

Response:
{"_shards":{"total":3,"successful":3,"failed":0}}

However, the deleted records are still there.

I also checked the settings of the table to make sure its not read-only. The status is green, so i assume all is good:

GET /newsarticles_web01_dev_637044237457830355/_settings

{
  "newsarticles_web01_dev_637044237457830355" : {
    "settings" : {
      "index" : {
        "creation_date" : "1568852146875",
        "number_of_shards" : "1",
        "number_of_replicas" : "2",
        "uuid" : "P91Zbf4ATOWzCCsyYhvYNw",
        "version" : {
          "created" : "6060199"
        },
        "provided_name" : "newsarticles_web01_dev_637044237457830355"
      }
    }
  }
}

We actually do have platinum support and I do have an open ticket. But I've found that for non-emergency cases, forums work far faster.

Run these what are the results ... new segments are not available until after _refresh

POST newsarticles_web01_dev_637044237457830355/_refresh

GET _cat/segments/newsarticles_web01_dev_637044237457830355/?v

GET _cat/indices/newsarticles_web01_dev_637044237457830355/?v

@stephenb Results:

curl -XPOST -u elastic:mypassword "http://rw-es-oc2-d01:9200/newsarticles_web01_dev_637044237457830355/_refresh"
{"_shards":{"total":3,"successful":3,"failed":0}}

curl -XGET -u elastic:mypassword "http://rw-es-oc2-d01:9200/_cat/segments/newsarticles_web01_dev_637044237457830355/?v"

curl -XGET -u elastic:mypassword "http://rw-es-oc2-d01:9200/_cat/indices/newsarticles_web01_dev_637044237457830355/?v"

Please don't post images. Please paste in the text. Many people cannot even read the images, especially on mobile devices and I can't be searched or debugged.

@rgelb

EDIT DOH I am not paying attention

Per Docs

max_num_segments The number of segments to merge to. To fully merge the index, set it to 1 . Defaults to simply checking if a merge needs to execute, and if so, executes it.

So Basically Elasticsearch does not think it needs to merge so it does not!!! so nothing is happening!

POST /newsarticles_web01_dev_637044237457830355/_forcemerge?max_num_segments=1
POST /newsarticles_web01_dev_637044237457830355/_refresh
GET _cat/segments/newsarticles_web01_dev_637044237457830355/?v

That will collapse down to a single segment per shard and get rid of the deleted docs

BTW Force Merging Indices to a single segment that are read only can (not always) save space and make searching more efficient...

@stephenb For whatever removing only_expunge_deletes=true from the command did the trick and removed all the deleted docs.

Thank you.

1 Like