Thanks for the suggestion. I should have clarified that this is not the solution I need. Later, I want to filter the query by a certain time range, and I will have to include results with a doc count of 0 here.
For instance, assume the field is a server hostname, and it sends a heartbeat every 10 minutes. I want to find out whether a server is down by checking whether the sum of heartbeats for each hostname is 0 or not, within a certain time interval. So I need the results where the doc count is 0, and all hostnames that actually exist should be included in the aggregation.
Thanks, I see what the underlying issue is, and I am aware that this might have performance impacts. Probably it will be a one-time operation to clean up the index.
What I don't directly see from the linked post is what I can do now, if anything at all. The blog post references a deprecated optimize API for Elasticsearch 1.x that doesn't exist anymore.
Do I understand correctly that the force merge action will achieve what I need? In particular, am I correctly understanding that I need to set only_expunge_deletes to true?
This depends on what you still want to do with your index. Force merging is only recommended for read-only indices (See the warning Box on the Force merge page). If you still want to write/update in the index, you could wait for a merge. Otherwise you could rollover (in case of a data stream) and force merge the index then.
Interesting. I didn't do anything and the field disappeared. Could it be that Elasticsearch automatically cleans up these keywords? (I have not configured anything special for this index.)
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.