Timeseries data aggregation


(Soumitra Kumar) #1

Hello,

I have timeseries metrics about memory usage by processes. Here are the fields:

- timestamp when the sample was taken
- userid
- process id
- memory of process id at time

Here are few samples in CSV format:

1, foo, 1, 100
1, foo, 2, 500
2, foo, 3, 100
2, bar, 4, 100

In general there are many processes and one user may have multiple processes running at the same time. At a particular time, memory usage by all processes are indexed into Elastic. What is the best way to find out peak memory usage by any user in last 15 days?

Here is the query with aggregation, which adds the memoryMB by user at every time instant, then I post process the output from ES to get the max memoryMB.

{
    "from":0,
    "size":0,
    "query":{"bool":{"must":[{"range":{"date":{"gte":"now-15d"}}}]}},
    "aggs": {
        "name": {
            "terms": {
                "field": "user",
                "size": 1000
            },
            "aggs": {
                "date": {
                    "terms": {
                        "field": "date",
                        "order" : { "_term" : "asc" },
                        "size": 15000
                    },
                    "aggs": {
                        "mb": {
                            "sum": { "field": "memoryMB" }
                        }
                    }
                }
            }
        }
    }
}

It looks like I am abusing terms aggregation, I am also exploring groovy plugin. What is the best way to do it?

Thanks
-Soumitra.


(Mark Walkom) #2

A good tip when you are getting started is to try building something that looks, visually, close to what you are after in Kibana and then copy the agg that it generates and adjust to your needs :slight_smile:


(Christian Dahlqvist) #3

Is your memory usage data captured with a certain frequency, e.g. every minute, so that you can use a date histogram aggregation with the interval set to the periodicity of the data?


(Soumitra Kumar) #4

I have periodicity of 90 seconds, but it is not guaranteed.


(Christian Dahlqvist) #5

Can you have stats coming in at different times that need to be aggregated together, e.g. from different hosts?


(Soumitra Kumar) #6

There may be stats coming at different time from different hosts.

But for this use case, I want to find out the peak memory usage at any given time. So, the memory usage for every usage, for every timestamp has to added, then need to find out the max.


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.