I have fluentd pushing logs into elasticsearch with index names based on the date, e.g. logs.kubelet.YYYY.MM.DD and using index lifecycle management (ILM). I have set the ILM policy to roll over after 50GB or daily, whichever comes first. What is strange is that it seems to continue to roll over old indexes as the days go by even though no new documents are added to them.
For example, an index from 7 days ago has been rolled over from suffix -000001 all the way up to -0000007. The indexes with suffixes -000002 and up are all empty.
Does anyone have any idea where I might have messed things up?
When you are using rollover you need to index into the write alias and not date based indices like you do now. You should update your fluentd index to write to the write alias, which will allow rollover to work properly. Now it probably rolls over based on time as no data is written to it.
I think fluentd creates a new write alias every day. So each day there's a new index, rollover indexes, write alias, index template. At least, it is supposed to do that.
I guess what you are saying here is that the ILM policy continues to apply even though no new documents are being indexed.
I suppose if I disabled the max_age then it would stop rolling over to a new index, but then it would never delete the "last" old index because the delete action only applies after the index rolls over.
I guess I have to sacrifice having the date in the index names if I want to use ILM, because the ILM policy will never delete the "current" write alias.
No, that is not how rollover works. I do not believe fluentd has any support for rollover so it likely just creates one index per day alongside the rollover index you have configured.
I suspect the ILM policy may not apply at all to the indices created by fluentd.
Does not matter if no data is indexed into it as per my previous point.
You do not need this anyway as ILM base phase logic on index metadata.
The fluentd elasticsearch plugin has added some ILM support in recent months, so it does actually create a new index template, rollover index, and so on for each day if we configure it that way.
However, it seems if we configure it this way, we shouldn't use age based rollover in the ILM policy. It should only be used to set a maximum size for the indexes and we would have to use curator to clean up indexes after a period of time, which seems to limit the usefuless of ILM.
I'll migrate over to not having the date in the index name, it seems like the right thing to do now that I've realized how the rollover logic works.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.