I'm trying to write a Curator script that should remove all indicies older than one hour. This is just a test to see if my cron job is working as it should.
However, I'm a little confused of how the time/unit-settings work.
If I use:
unit: hours
unit_count: 1 (up to 8)
All documents are deleted.
If I use:
unit: hours
unit_count: 9
It seems to work.
I could have understand if it was one hour back and forth due to time zones, but this? Makes no sense.
First, Curator calculates all ages in UTC, because that's how Elasticsearch stores them. If you expected them to be in your local time zone, that could explain the discrepancy.
Second, since you haven't explained what the source is, there are differences that could affect the answer.
source: creation_date
This will delete indices which have a creation_date older than the number of seconds in unit multiplied by unit_count, counting back from execution time.
source: name
Using source: name will calculate the index date by stripping any trailing seconds off. In other words, even if the first document in an index named index-2017.10.31.12 with a timestring of %Y.%m.%d.%H, the calculated index age will be the epoch timestamp equivalent of 2017-10-31T12:00:00.000Z.
It will then delete indices which are older than the number of seconds in unit multiplied by unit_count, counting back from execution time.
I found that you could also use field_stats. I've experimented with that one, but still no luck.
How should I configure Curator to delete all documents older than 5 minutes? @timestamp is my date field.
Okay. You're trying to delete by hours, when your indices are created by days. Do you see the problem with that approach? To repeat, Curator does not delete documents, it deletes entire indices. It calculates the age of the index based on the name, the creation_date, or using the values of the min or max document age via the field_stats. It then deletes the index based on whether that calculated date is older than (unit_count times the number of seconds in unit) seconds older than the time of execution.
You should be reckoning your indices in days since that's how they're created.
Ok, so I've got two options. create indicies down to the minute OR do the sane thing and only delete per day.
The reason I wanted to delete by minutes/hours is that I'm trying to do a cron-job to run this automatically. So to not have to wait until the next day, I tried minutes.
Any good advice on how to write the cron job script in Ubuntu?
If you're trying to delete the previous day's index, or one older than x days, then the easiest way to make that happen in a timely fashion, with days as your unit, is to schedule cron to run a few minutes after 0:00 UTC, so that the previous day, or days worth of indices, are all at about the exact time a new one is created. If I scheduled my cron to run at 0:15, and delete indices older than 1 day old, then the previous day's index would be deleted, because it's older than 1 day. This might look like:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.