I manage an elasticsearch cluster with about 25 different applications sending their logs to it. The naming convention for the index logging is this, apps-logs--yyyy.MM.dd.
We have apps that do load testing and during this testing write an enormous amount of data. A few days after the load test that data is not needed and can be deleted. With our current Indexing it's not that easy to figure out what needs to be deleted. I would like to have an automated way to purge the load test data from the normal data. The normal data we keep for 60 days.
My idea is to use an index alias that the applications can send their logs too.
New Alias "alias-appname-yyyy.MM.dd" -> apps-logs--yyyy.MM.dd. Then during a load test update the alias to point to test-logs--yyyy.MM.dd.
Then I have a job that runs and deletes test-* if it's older than a week.
How do I create an alias with the date stamp like I described, that is different every day? I would like to create the alias once, and not every day.
Is there a better way to handle the use case I described ?
When you do load testing, at that same time, would there be any non test data being written to the indices? If there is, would that non test data also get written to the alias that points to
Does the test data have to insert data into the same set of indices that contains non test data?
Is there any reason why whatever initiates the load testing can't create a completely separate index that has the same mapping etc as the real world indices? And when you are done with testing, you just delete the index and not have to worry about the non test data at all!
Also LPT, you should definitely have aliases for your non test data as well
Thanks for the quick reply. I have talked about just using a separate index that you could flip something in the code to use at load test time. One development group said it would be easy the other one said it would be a challenge. So I was looking at alternatives. From what I gather from your reply, what I was suggesting won't really work.
I have heard that you should use aliases but I haven't really needed too. For the most part we create multiple indices with daily timestamps and that works. What am I missing by not using aliases?
I am curious, what was the challenge?
I am not saying it won't work, but I am concerned what happens to real world traffic when you switch the alias to test and data keeps coming in! Unless you have a way of halting that real world traffic, or postponing ingestion, I don;'t know how you keep data separate.
Aliases make it easy to access data. Lets say you have indices marked per month. If you have a shifting alias that always points to the current index, than you don't need to worry about what the format of the index is, what the name is. All you need to do is point to the alias
current-index. You can easily have read only aliases that cover a variety of indices making searching easier.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.