I have a machine that sends me temperature metrics that I'm pulling from two different files on that machine with filebeat. I'm trying to get some sort of current status of all the metrics of the machine in a query, and having a hard time because I need to get it from two different documents.
For example:
PUT machine/_doc/1
{
"temp1": 12,
"@timestamp": 1596123915687
}
PUT machine/_doc/2
{
"pressure1": 3,
"@timestamp": 1595123815687
}
PUT machine/_doc/3
{
"temp1": 10,
"@timestamp": 1596023915687
}
PUT machine/_doc/4
{
"pressure1": 1,
"@timestamp": 1595023815687
}
The most recent temp1 is from document 1 and the most recent pressure1 is from doc 2.
I want in a single query to get the most recent pressure and document for a given machine. Top hits aggregation works for a single document but not over different documents. I also debated transforms but they also have a hard time pulling over multiple documents
Are you then saying to do a top hits under each of those filter aggregations?
This is what I came up with but it's just quite bulky because I actually have 3-5 different documents with values I need to pull from, and I'll be running this query every 5 mins or so (across 1000 different machines).
Alternatively you can put a a filter aggregation in front (requires >= 7.7) like you did in post 3.
I second the suggestion with the range filter. With your current config the transform would retrieve all historical values every 5 minutes, which will lead to bad performance. Pragmatic solution: use now-10m.
Or you add a group_by with a date_histogram and e.g. fixed_interval: 5m, this way you get the last last state and historical values bucketed by 5 minutes (or any other interval you like). Transform (>=7.7) will internally use a range filter in this case, so you do not have to add it to the query yourself.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.