Aggregate events with start and end date

Hello,

I have events with fields start_date and end_date in epoch in the same document.
I want to sum fields values which are in the events on Y-axis (represented at the end with %) and X-axis will represent the time on which each events are between start_date and end_date.
I which to do it with Kibana.
I am looking around timelion and bucket script aggregation, but i am a little bit confused on how to do this.
By reading some quiet old posts, i found a method by pre-processing events and insert an array on each with a field "running_at" which just represent the date on which the event in running.
I guess with pipeline aggregation we could do it in a more efficient way, but i am stuck on how implement it.

Thx in advance for any kinds of help.

Nicolas

Hello,

Replying to my self, i am looking arround an elasticsearch request in order to make a pre processing on my events documents and reinject the result into an other index, i have something like this for the moment :

POST /jobs/_search
{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "time_start": {
              "lte": 1540749600
            }
          }
        },
        {
          "range": {
            "time_end": {
              "gte": 1540749600
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "server": {
      "terms": {
        "field": "server.keyword"
      },
      "aggs": {
        "sum_alloc_cpu": {
          "sum": {
            "field": "alloc_cpu"
          }
        }
      }
    }
  }
}

So, with this request i have the couples of values i wish, but only of course on the specified epoch on which the event run.
I did not found find yet how i can make, lets say, 60 buckets, in order to make an aggregation of these 60 values into an average aggregation, which will be a bit less painfull rather than execute it 2592000 times (nb of seconds in one month).
I suspect i can make a painless script within the first aggregation ( and not use query here ), but i did not find how to arrange it.
If someone have an idea, i would be more than happy to ear it :slight_smile:

rgds,

Nicolas

I wonder if you might be able to do this through a scripted aggregation, similar to the example in this old post.

Hi Chritian,

The old post you mentionned was the first i did try for my issue, i did face to some limitations with painless script inside kibana due to the amount of objects i could have to manage ( ~3millions ).
By the way, i think i am near a solution :

GET /jobs/_search
{
  "size": 0,
  "aggs": {
    "server": {
      "terms": {
        "field": "server.keyword"
      },
      "aggs": {
        "jobs1": { .. },
        "jobs2": { .. }
      }
    },
    "sum_total": {
      "sum_bucket": {
        "buckets_path": "server>jobs1>sum_alloc_cpu"
      }
    }
  }
}

Where :

"jobsX": {
  "filter": {
    "bool": {
      "must": [
        {
          "range": {
            "time_start": {
              "lte": 153835200X
            }
          }
        },
        {
          "range": {
            "time_end": {
              "gte": 153835200X
            }
          }
        }
      ],
    }
  },
  "aggs": {
    "sum_alloc_cpu": {
      "sum": {
        "field": "alloc_cpu"
      }
    }
  }
}

I am stuck here to play with bucket_path and to give it a kind of wildcard like "server>jobs*>sum_alloc_cpu".

The response actually give is :

{
  "aggregations" : {
    "server" : {
      "buckets" : [
        {
          "key" : "server1",
          "jobs1" : {
            "alloc_cpu" : {
              "value" : 5.0
            }
          },
          "jobs2" : {
            "alloc_cpu" : {
              "value" : 8.0
            }
          }
        },
        {
          "key" : "server2",
          "jobs1" : {
              "value" : 7.0
            }
          },
          "jobs2" : {
            "alloc_cpu" : {
              "value" : 3.0
            }
          }
        },
        {
          "key" : "server3",
          "jobs1" : {
            "alloc_cpu" : {
              "value" : 4.0
            }
          },
          "jobs2" : {
            "alloc_cpu" : {
              "value" : 1.0
            }
          }
        }
      ]
    },
    "sum_total" : {
      "value" : 16.0
    }
}

I wish to be able to have the sum of these aggregation of : jobs1, jobs2, etc ... for alloc_cpu per key "server1, server2, ... "

I would definitely prefer to script generation of aggregation from jobs1 to jobsX , but for now, how can i be able to sum aggregation metric on the way i did mention ?

Thanks in advance.

rgds

Nicolas

Going ahead on the issue i try to resolv, i found something interesting by using buckets script in this way :slight_smile:

         "aggs": {
            "alloccpu": {
                "bucket_script": {
                    "buckets_path": {
                        "jobs1cpu": "jobs1>sum_alloc_cpu",
                        "jobs2cpu": "jobs2>sum_alloc_cpu"
                        "jobs3cpu": "jobs3>sum_alloc_cpu"
                    },
                    "script": {
                        "source": "(params.jobs1cpu+params.jobs2cpu+params.jobs3cpu)"
                    }
                }
            },
            "jobs1": {..},
            "jobs2": {..},
            "jobs3": {..},

The problem here, is when i have thousands of jobs i did get something like :

Caused by: java.lang.IllegalArgumentException: Scripts may be no longer than 16384 characters. The passed in script is 67313 characters. Consider using a plugin if a script longer than this length is a requirement.

I understand with Scripts may be no longer than 16384 characters - #3 by bfcshop , that i may be able to use elasticsearch 6.6 and increase the soft limit.
But i am looking for something which can be more optimized rather than actually.
Any ideas ?
Thx in advance :slight_smile:
Rgds
Nicolas

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.