Hi, I have an ES database which is updated daily. We maintain two indexes (one for each day's worth of data). During the update, the older of the two indexes is deleted and a new one is created for the new day. We want to use a Kibana dashboard to keep track of trends in the data and we would like for it to go back further than just two days. A month, for example. Is this possible while still deleting old indexes? In other words, say I want to track the total number of documents in indexes over time. When plotting this in a Kibana dashboard, can the plot maintain the number of documents in an index even after the index is deleted?
A couple points to add. First, it is not possible for us to keep indexes longer than 2 days. The volume of data is too large and expensive to maintain. So we need to delete indices after a relatively short period of time. The metrics we would like to track are aggregations of this daily data. Second, I think that what I'm suggesting would definitely be possible if we were to just create a new set of indexes specifically for this purpose. Say for example we want to keep 30 days worth of statistics about our database. Then we could create a new index called 'aggreagted_metrics' which stores the information we're interested in tracking and purges information on indexes older than 30 days (even though the indexes themselves will have been deleted much earlier than this). Assuming that what I asked initially is not possible, is this a reasonable approach to take. Any advice/suggestions on this or other alternatives would be welcome!
It sounds like your use case is a good fit for a transform that would keep summary data that you could track for historical purposes. The Transformed index will be a tiny fraction of the size of the original
Hey Stephen, thanks for the response! I'll check out the link you sent, it does seem to be what we need. Just to make sure I understand, it sounds like this transform routine basically creates a new index from summary statistics of your "main" indexes. Is this roughly what's happening? If so, yes, it sounds like exactly what we need!
Yes slight correction it creates an index which contains documents that summarize this week's, this day's aggregations etc... it does not create a new index every time it runs.
Ah, yes sorry I think that's what I meant. New as in, the first time it runs it creates a new index. After that, it adds to this index. This would be similar if not completely analogous to the 'aggregated_metrics' index I mentioned in my example in my first message. So it sounds like this is exactly the feature we need to carry that out! I don't have a ton of time to look more into it just now but I'll hopefully be able to this evening and I'll post here if I have any other questions. Thanks!
Hi @stephenb (or anyone else!), I was messing around with transforms a bit in Kibana. So it seems that I have two choices for transform type: 'Pivot' or 'Latest'. I think that I want to use pivot. So to recap, our situation is that we have a logging job writing to our Elasticsearch cluster daily. It deletes the old index which is named: <static_name>_YYYYMMdd
We do not have any timestamp fields present in the index. The only reference to time is in the suffix of the index. What I'm trying to do is to use transform to create a "summary index" and to update this index daily with some aggregate statistic about that days index. This summary index should also save all aggregate statistics (up to some upper limit, say 30 days or so) about the deleted indexes. The problem I'm running into when I test this is that in Step 2 which is Transform Details, I cannot select continuous mode because I don't have any timestamp fields in the indices. Is there a way I can get around this? I would like for the transform to update based on the date suffix in the index.
To keep it simple, let's say I have the index situation above and all I want to do is create a summary index which counts the total number of documents in an index each day for the past 30 days. So at any given time I should have the number of documents in each of the indexes for the past 30 days. Of those 30, only the latest one will actually be saved in the elasticsearch cluster.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.