I have an application with one build-in Elasticsearch node used for collecting log events into 3 indices. The application is deployed at several sites.
At some point in the future the implementation is going to be changed to data streams with polices, but for the moment the oldest documents are delete with a delete by query request executed by a Tomcat job.
However, in one specific deployment an index has reached the Lucene limit for number of documents in an index and the error
Number of documents in the index can't exceed [2147483519
is logged by Logstash.
I have tried with a script to delete the oldest documents with delete by query requests, but this does not work when the index is full. The delete by query request is just ignored and no documents are deleted. For the other indices which are not full the script works as expected.
I now want to try to re-index the oldest documents into a new index and then delete them.
My question is if the is possible to re-index, when the index is full?
Looking at the blogs seems your version must be 7.3.2 , could you please confirm?
I believe split API can be used as per the below link & i have tried but for smaller index :
GET kibana_logs_success/_count #14074
PUT /kibana_logs_success/_block/write
POST /kibana_logs_success/_split/kibana_sample_logs_split
{
"settings": {
"index.number_of_shards": 10
}
}
GET kibana_sample_logs_split/_count
I used 10 primary shards you can consider a lower number :
We are maintaining the application but has not changed this part of the application yet except from upgradering to the latest version of Logstash and Elasticsearch. The application does not use Kibana.
The delete_by_query is similar to the one stated above except from I delete 100.000 documents per batch. When an index is not full it takes around 100 seconds per batch.
There are only one Elasticsearch node and one shard per index.
There you have the output from _cat. I do not have access to the server on which Elasticsearch runs, so I have to do all work by exchanging scripts and documents.
It is part of the script which performs the delete by query post to perform a _cat indices request.
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size dataset.size
yellow open apclient_log_entry_v1 TL7gZYUfSGG1mzQqwPpcjQ 1 1 520413 64601 182025 182025 182025
yellow open audit_entry_v1 uwuExeL1TXS5Y7tDliMbbg 1 1 1240849 134926 675169 675169 675169
yellow open rest_log_entry cNSInhLPT9KFelhbBHxNmA 1 1 2147483519 0 240451302 240451302 240451302
yellow open rest_log_entry_v1 hrkgT7-kQIq4UuD7LR578w 1 1 0 0 0 0 0
Thanks I was hoping that there was still deleted docs in the index then you could expunge them in. The total number would come down but.... Unfortunately, you've already hit the limit and there are no deleted docs left in me index.
If you just run a query with no delete does it find documents?
I think you're going to need split the index into more than 1 shard to work with it.. The limit is actually at the shard level and since you only have one primary shard that's why you are limited.
If you split it into several shards you should be able to go back to working with it.
Need to think about a longer-term strategy about these indices more than one shard otherwise you're going to keep running into this limit
The business rule that I am trying to achieve on short term is to be able to insert new documents in the rest_log_entry index by deleting the oldest documents. No index must contain documents older than 13 months. Since the two first indices are not full it works to delete the oldest documents with both the script and the Tomcat job (when the Tomcat job are deployed to this server).
The query the script runs for the moment is to find the oldest and youngest documents in each index.
Since I cannot delete from the full index my plan is to reindex the oldest documents into at second index. I'll take a look into splitting the index into several shards if it works for a full index?
On the term I'll change the to datastreams and index policies in order for Elasticsearch to take care of deleting the oldest documents.
You can also update the index setting "index.merge.policy.deletes_pct_allowed" to below 20%
This will auto manange the retention of deleted docs during merge operations
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.