I want to use Elasticsearch as a time series datastore.
Where each event comes in with a timestamp and serial and all other data. I was told by @warkolm I can graph changes over time, either at an individual serial level, or on an aggregated level but I need more detailed guidance on how to do this.
I want to retrieve only the latest "event" (because the data I'm using have several documents with the same serial number but only the latest documents are of interest to use for dashboards to analyze data).
I am not sure which guide to use presented by elastic. Is it a conditionally run processor somehow?
Thankful for all input I can get. And thanks to @warkolm who has been patient with me.
When indexing it is important that only the latest of every event (document with same serial number) is ingested so that the numbers is right when showing reports, piecharts and such in kibana, or the statistics are wrong.
However, the users also want to be able to see full content (so that all the documents are accessible even with same serial number, for other reasons), so the ultimate solution would be to save the cake and eat it too if you know what I mean.
So when analyzing statistics, you want to see only the unique documents (latest of every document with same serial number), and for troubleshooting, all existing documents.
I think the"easiest" (not necessarily the most efficient) is save the data in 2 indices let's call them latest and detail.
In the latest index you would use your serial number as the document _id , and each time a document is ingested it will overwrite the entry. This assumes your serial number is unique across the entire data set. This index only the last update will appear.
In the detail index you would not use your serial number as a document ID so every version of the document would be saved and then you could filter and look at and see the history
This is just one suggestion And there are some downsides to this as you have a bit more data and the latest index will have deleted documents which will be cleaned up at some point. (This can usually be automated) But on the other hand it's pretty straightforward and I've seen a number of users use this very successfully.
There Perhaps another approach with transforms but I'll let someone else answer that.
And yet again there may be another approach using aggregations and max time but it depends on How you want to display and look at the data.
Like @warkolm was asking, I'm going to assume you're trying to visualize in Kibana. Two ideas, just depends on what you want to visualize -
for a "right now" style visualization on metrics, use the "Latest" transform configuration to add your "serial number" as the unique key....sorted on your time field. This will give you the number of documents equal to the number of your "serial number". I've pasted an example below of this example doing something similar on my metric data (with unique key by host). Here's a tutorial on transforms Tutorial: Transforming the eCommerce sample data | Elasticsearch Guide [7.15] | Elastic
To plot historical trends of these metrics using a date histogram, take advantage of the "Last Value" option in visualization tools like Lens & TSVB (it's called "top hits" in aggregation based visualizations and Elasticsearch). This will let you pluck the last value based on timestamp within each bucket of the report to see patterns historically. I've pasted an example of this below
Example 1 - Transforms for a "Latest" index (use for single metric / "now" style visualizations).
Hope this helps give you some ideas! Also don't forget aggregations like "unique count" ("cardinality" agg in Elasticsearch) if you need to count "serial numbers"
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.