BTW, 54b2372cdd4d073b77ed99bd5b93930f5c2c0333 is just one of our git commit sha-1.
To our surprise, the above delete_indices action actually matched and deleted both of the indices above.
On paper, there is nothing in the name of those indices suggests a match to "3 months older than 2018_10". However, it just did repeatedly, never failed.
2018-10-10 23:01:22,307 DEBUG curator.indexlist __actionable:35 Index 54b2372cdd4d073b77ed99bd5b93930f5c2c0333_204828402480248 is actionable and remains in the list.
This "54b2372cdd4d073b77ed99bd5b93930f5c2c0333" seems to be a magic number. Had I changed to a different sha-1 then it will not match, such as:
2018-10-10 23:01:22,308 DEBUG curator.indexlist filter_by_age:546 Index "8f87138194295772625d4eb6365a3f9cdc233cea_2018.10.10" does not meet provided criteria. Removing from list.
Could it be some kind of internal regex match or numeric handling/overflow problem?
With an SHA-1 hash as an index name, you should not be using source: name as your age filter. You are correct that it is regular-expression related. I could not have guessed (though perhaps I should have?) that someone would use a hash as an index name, where it would contain a large set of letters and numbers. A timestring value of %Y_%m is merely translated to a regular expression like ^.*(\d{4}_\d{2}).*$. After these values are extracted, they are calculated as a date. You can perhaps see, then where this might cause issues. I apologize for the bad user experience you have had. Please understand that this use case is not something I ever anticipated. It is definitely not a good thing with indices named like yours, but this sort of naming is atypical. Most people putting a named date in their index names are doing indexname-%Y.%m.%d or something like it. This is what source: name was meant to handle.
The good thing is that you’re not without recourse. First, don’t use source: name. Try creation_date instead, for example. If that doesn’t match properly because the documents inside don’t match the date of creation, then you can use the field_stats source instead, which can actually calculate the minimum and maximum timestamp values in each index. This is the most precise method, though it takes a few milliseconds longer per index to make those calculations.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.