The goal I am trying to achieve is to display the latest status of my data per day. So for example, I want to have the latest data for the latest timestamp on Monday, Tuesday, Wednesday etc. I tried to achieve this using a latest transform however the data was overwritten by the next days data. How can I group it by date and filename?
For context, I have data with a filename that is updated throughout the day with breached or not breached. I want to track the latest status of if it is breached or not breached daily. I have achieved this using lens table but I want to be able to view it in different visualisations e.g. a bar chart etc. which I cannot do as filename is a keyword.
Hope this makes sense, thanks in advance for any help.
Mmm, sadly I'm not clear on what you are after. Why you need a transform? Can you maybe translate it into a question about documents, e.g. by sharing a couple of sample documents? And maybe the mapping too.
Are you updating documents in the elasticsearch index, or are they indexed once and never changed?
To get latest document from a specific day via a query is pretty trivial. Let's assume your index has a timestamp field called timestamp and the filename is called filename.keyword, then following will give you last document yesterday per filename (might not be all depending on how many distinct filenames you have, I set limit to 1000..
You may be able to use a pivot transform with your two desired group_bys, date_histogram for the day and terms for the filename.
This issue has a similar request, and thread and answer may help craft the Transform you're looking for? Latest full document in transform
By default, the Transform will wait and calculate the previous day's status. This will limit the number of searches and writes the Transform needs to make to update the entity to 1. If you would like the Transform to continuously update the status throughout the day, you can disable settings.align_checkpoints, and the Transform will search every frequency and update the date_histogram bucket as new data is searched and observed.
Sorry, I don't think I was very clear. Maybe I don't need a transform at all but since I wanted the latest docs I thought a latest transform may be suitable. The documents are updating in the index.
The goal is to be able to show the last record for each day for each file. So I have @timestamp, multiple filename.keyword and status.file --> I am able to show in lens a table of the timestamp per day with the filename.keyword and last value of status.file which is either breached or not breached. I want to be able to use other visualisations other than a table, for example a bar chart but because filename.keyword is a keyword, I am unable to use the last value so I am wondering if there is another solution?
Hi Patrick,
Thanks for the reply - I had looked at the referenced question but I'm wanting to know the daily status, so for the transform to run one day and store the latest daily status then to run the next day and store the latest daily status without over riding the status from the previous day.
So I want to see the status of each of my files at the end of the day on Monday, Tuesday, Wednesday etc if this makes sense? And I think the referenced similar question gives only the latest, and so would over ride the status from yesterday etc
You'd want to use a date_histogram to accomplish the daily buckets that you're looking for, and it can be used both within a Transform or a regular search request, whichever option you want to go with would work:
Just to be clear, do you mean the documents are changing (often this is referred to as an update) after they have been initially ingested into the elasticsearch index? I ask because the references to "over ride the status from yesterday" is confusing.
For a bar chart, you need something numeric to be the height of the bars. What is that in scenario you are trying to achieve?
Again, it's very helpful if you share some sample documents and even a mockup of the sort of "end result" picture you are hoping to achieve. btw, almost whatever it is you are trying to is IMO likely achievable. Just need to better understand the question.
If you used something similar to what I and others suggested, as part of a transform reading from existing indices and writing to some sort of "summary" index, ran that transform once per day, over time you will get an index populated by the last document per filename per day. You can then view, explore, and aggregate the documents in that summary index in myriad ways.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.