Unable to delete documents from full index

fgjensen · June 17, 2025, 8:48pm

Hi;

I have an application with one build-in Elasticsearch node used for collecting log events into 3 indices. The application is deployed at several sites.

At some point in the future the implementation is going to be changed to data streams with polices, but for the moment the oldest documents are delete with a delete by query request executed by a Tomcat job.

However, in one specific deployment an index has reached the Lucene limit for number of documents in an index and the error

Number of documents in the index can't exceed [2147483519

is logged by Logstash.

I have tried with a script to delete the oldest documents with delete by query requests, but this does not work when the index is full. The delete by query request is just ignored and no documents are deleted. For the other indices which are not full the script works as expected.

I now want to try to re-index the oldest documents into a new index and then delete them.

My question is if the is possible to re-index, when the index is full?

Best regards
@fgjensen

RainTown · June 17, 2025, 9:07pm

look at

stephenb · June 18, 2025, 1:47am

What version of elasticsearch?

Please show the get cat indices for that index

GET _cat/indices/indexname?v

Tortoise · June 18, 2025, 4:02am

Hello @fgjensen

Looking at the blogs seems your version must be 7.3.2 , could you please confirm?

I believe split API can be used as per the below link & i have tried but for smaller index :

github.com/opensearch-project/OpenSearch

Allow Document Deletion From Full Shards

opened 08:24PM - 28 Apr 22 UTC

engechas

enhancement Indexing & Search

**Is your feature request related to a problem? Please describe.** OpenSearch s…hards have a 2147483519 doc limit. This is driven by a [hard Lucene limit](https://issues.apache.org/jira/browse/LUCENE-5843) on the number of documents that can exist in a Lucene index. Once an OpenSearch shard has reached the maximum number of documents, the index begins rejecting document deletions: ``` GET /_cat/indices/<my index>?v health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open <my index> ne6hnbXiQSOdpxg6zsqPgw 1 1 2147483519 0 569.2gb 284.2gb DELETE /<my index>/_doc/zpi-oH0Bo4VEviPH9aZC { "error" : { "root_cause" : [ { "type" : "illegal_argument_exception", "reason" : "number of documents in the index cannot exceed 2147483519" } ], "type" : "illegal_argument_exception", "reason" : "number of documents in the index cannot exceed 2147483519" }, "status" : 400 } ``` **Describe the solution you'd like** The ability to delete some docs within the index to free up space. Not sure if there's a Lucene-level limitation that prevents this behavior. **Describe alternatives you've considered** N/A **Additional context** N/A


GET kibana_logs_success/_count #14074

PUT /kibana_logs_success/_block/write

POST /kibana_logs_success/_split/kibana_sample_logs_split
{
  "settings": {
    "index.number_of_shards": 10
  }
}

GET kibana_sample_logs_split/_count

I used 10 primary shards you can consider a lower number :

Thanks!!

fgjensen · June 18, 2025, 5:18am

Hi

The version of Elasticsearch is 8.18.

We are maintaining the application but has not changed this part of the application yet except from upgradering to the latest version of Logstash and Elasticsearch. The application does not use Kibana.

The delete_by_query is similar to the one stated above except from I delete 100.000 documents per batch. When an index is not full it takes around 100 seconds per batch.

There are only one Elasticsearch node and one shard per index.

BR Flemming

stephenb · June 18, 2025, 5:30am

@fgjensen

Please show the cat indices.... There is a reason I'm asking for that.

GET _cat/indices/indexname?v

fgjensen · June 18, 2025, 1:21pm

Hi @stephenb

There you have the output from _cat. I do not have access to the server on which Elasticsearch runs, so I have to do all work by exchanging scripts and documents.

It is part of the script which performs the delete by query post to perform a _cat indices request.

health status index                 uuid                   pri rep docs.count docs.deleted store.size pri.store.size dataset.size
yellow open   apclient_log_entry_v1 TL7gZYUfSGG1mzQqwPpcjQ   1   1     520413        64601     182025         182025       182025
yellow open   audit_entry_v1        uwuExeL1TXS5Y7tDliMbbg   1   1    1240849       134926     675169         675169       675169
yellow open   rest_log_entry        cNSInhLPT9KFelhbBHxNmA   1   1 2147483519            0  240451302      240451302    240451302
yellow open   rest_log_entry_v1     hrkgT7-kQIq4UuD7LR578w   1   1          0            0          0              0            0

Thank's for your help.

@fgjensen

stephenb · June 18, 2025, 1:31pm

Thanks I was hoping that there was still deleted docs in the index then you could expunge them in. The total number would come down but.... Unfortunately, you've already hit the limit and there are no deleted docs left in me index.

If you just run a query with no delete does it find documents?

as @Tortoise suggested

I think you're going to need split the index into more than 1 shard to work with it.. The limit is actually at the shard level and since you only have one primary shard that's why you are limited.

If you split it into several shards you should be able to go back to working with it.

Need to think about a longer-term strategy about these indices more than one shard otherwise you're going to keep running into this limit

fgjensen · June 18, 2025, 2:07pm

Hi @stephenb

The business rule that I am trying to achieve on short term is to be able to insert new documents in the rest_log_entry index by deleting the oldest documents. No index must contain documents older than 13 months. Since the two first indices are not full it works to delete the oldest documents with both the script and the Tomcat job (when the Tomcat job are deployed to this server).

The query the script runs for the moment is to find the oldest and youngest documents in each index.

Since I cannot delete from the full index my plan is to reindex the oldest documents into at second index. I'll take a look into splitting the index into several shards if it works for a full index?

On the term I'll change the to datastreams and index policies in order for Elasticsearch to take care of deleting the oldest documents.

stephenb · June 18, 2025, 2:12pm

Not sure exactly what that solves since the old docs will still be in you main index. But you seem to have a handle on your problem.

It will. Then you should be able to delete.

Yes this will solve many issues for you....

Good luck, let us know how it goes.

fgjensen · June 18, 2025, 2:26pm

Hi @stephenb

Thanks for a swift answer.

I understand splitting is the shortest way to be able to delete the oldest documents, so I will try that solution.

Thanks to all who have posted to this question.

BR @fgjensen

RainTown · June 18, 2025, 2:34pm

How much of the data is "older than 13 months" ? 1% 10% 95% ?

Splitting the index will write all the documents out to N new shards

re-indexing can select only the newer documents, also writing to a new index with however many shards you choose.

--> depending on the numbers re-index-ing-only-what-you need might be quicker.

(don't forget to check disk space availability)

stephenb · June 18, 2025, 2:52pm

I am going to take a mad guess and say..... 1/13th of the data

But yes worth checking...

RainTown · June 18, 2025, 2:56pm

Actually, scratch that, the split API will be quicker in vast majority of cases, as it should just be hard linking a bunch of files.

The deleting of 1/13th (or however much there is) of data will take longer, but in that time the (new) index is usable.

fgjensen · June 18, 2025, 3:02pm

The number of documents in each index is growing per month , since the application load is increasing. But that a side it says in split documentation:

* Before you can split an index:
* The index must be read-only.
* The cluster health status must be green

We can handle the read-only requirement, but this is a single node cluster so it will never turn green

stephenb · June 18, 2025, 3:05pm

Set Replicas to 0 and cluster will be green
Having replicas set above 0 on a single node will always result in a yellow cluster

RainTown · June 18, 2025, 3:05pm

A single node cluster can be green. I have one right here on there Mac I'm using right now.

{
  "cluster_name": "elasticsearch",
  "status": "green",
  "timed_out": false,
  "number_of_nodes": 1,
  "number_of_data_nodes": 1,
  "active_primary_shards": 316,
  "active_shards": 316,
  "relocating_shards": 0,
  "initializing_shards": 0,
  "unassigned_shards": 0,
  "unassigned_primary_shards": 0,
  "delayed_unassigned_shards": 0,
  "number_of_pending_tasks": 0,
  "number_of_in_flight_fetch": 0,
  "task_max_waiting_in_queue_millis": 0,
  "active_shards_percent_as_number": 100.0
}

Possibly you have indices with number_of_replicas set to 1, and nowhere to put them? Set to zero?

esijati · June 20, 2025, 5:37am

You can also update the index setting "index.merge.policy.deletes_pct_allowed" to below 20%
This will auto manange the retention of deleted docs during merge operations

Topic		Replies	Views
Unable to delete the documents when the index is full Elasticsearch	7	1923	February 2, 2021
How to safely clean old documents (by date) Elasticsearch	5	433	July 6, 2017
Missing Documents Elasticsearch	8	581	July 6, 2017
Delete_by_query number of documents Error Elasticsearch	4	586	January 3, 2021
How does ES handle deletes? (keeping a sliding window of documents) Elasticsearch	10	1722	July 6, 2017

Unable to delete documents from full index

Related topics