As mentioned in the blog https://www.elastic.co/blog/elasticsearch-as-a-time-series-data-store
I want to have a mapping template to monitor my application metrics. I have defined the exact mapping as the blog. Now for each metric I would be parsing from a xml in logstash how do I map it to the fields mentioned in the template.
For eg: I have a field namely response_time. I want to find the mean, min and max response time in the entire data.
"properties": {
"@timestamp": { "type": "date", "doc_values": true },
"max": { "type": "integer", "doc_values": true, "index": "no" },
"mean": { "type": "integer", "doc_values": true, "index": "no" },
"min": { "type": "float", "doc_values": true, "index": "no" }
}
How do use the mapping defined to map the field response_time to these properties defined in the mapping to achieve my use case. How do I configure it?
When you submit the document, do you just have the response_time, or have you already calculated the max, mean, and min values?
In the blog they have already determined the max, mean, and min values prior to the indexing request. So some other process is taking a bunch of data, doing aggregations on it, and then inserting it into Elasticsearch.
You can also have Elasticsearch do it for you. Just insert the document with response_time as a value. Then use the built in aggregations functions to get your max, mean, and min values. Depending on where you get the data this could be simpler to setup, but it could be less efficient.
For example let's say you get 200 million documents per day. And you want to know the average of the response time across the past year. Now your aggregation needs to work across 73 trillion documents. That could take a while. Especially if this is something that needs to be done regularly.
So the alternative is to keep your first index like normal. But every day you take those 200 million records, do any aggregation work that you want, and then insert it into a separate index. This new index has 1 record per day as opposed to 200 million. So when you want to run the last year, you touch this smaller index as opposed to the larger one. You are basically doing the heavy aggregation work once instead of every single request.
The blog post looks to be the later instead of the former. Logstash cannot do this on the fly though. You either need to index the data, then pull it out and index it into a separate index. Or some other software needs to give you data that already has this done.