Hello,
I have timeseries metrics about memory usage by processes. Here are the fields:
- timestamp when the sample was taken
- userid
- process id
- memory of process id at time
Here are few samples in CSV format:
1, foo, 1, 100
1, foo, 2, 500
2, foo, 3, 100
2, bar, 4, 100
In general there are many processes and one user may have multiple processes running at the same time. At a particular time, memory usage by all processes are indexed into Elastic. What is the best way to find out peak memory usage by any user in last 15 days?
Here is the query with aggregation, which adds the memoryMB by user at every time instant, then I post process the output from ES to get the max memoryMB.
{
"from":0,
"size":0,
"query":{"bool":{"must":[{"range":{"date":{"gte":"now-15d"}}}]}},
"aggs": {
"name": {
"terms": {
"field": "user",
"size": 1000
},
"aggs": {
"date": {
"terms": {
"field": "date",
"order" : { "_term" : "asc" },
"size": 15000
},
"aggs": {
"mb": {
"sum": { "field": "memoryMB" }
}
}
}
}
}
}
}
It looks like I am abusing terms aggregation, I am also exploring groovy plugin. What is the best way to do it?
Thanks
-Soumitra.