Counting an event in multiple buckets


Suppose I have events (telephone call logs) where each event has:

  • start timestamp
  • end timestamp
  • destination id
  • ...

I can plot a # of calls distribution along a time interval just by defining the Y-axis as a Count metric and the X-axis as a date histogram on the start timestamp field. Therefore, if the log contains 1000 calls in a day, the sum of the y value of all data points on the resulting distribution will be 1000 (distributed in each hour on an hourly interval, for example). In this case, the semantics of each point is "the number of calls STARTED in this hour interval".

However, I want to plot a distribution based on the whole duration of a call. Therefore, if a call starts at 1pm and ends at 4:30 pm, this single call must add 1 to the 1pm-2pm bucket, 1 to the 2pm-3pm bucket, 1 to the 3pm-4pm bucket and 1 to the 4pm-5pm bucket, and the same occurs for each call in the log. Therefore the sum of the y value of all data points on a 1000 calls log can be more than 1000. The desired semantics is "the number of active calls in this hour interval". Since one call may span several time intervals (x-axis), it should add 1 to each of those hour intervals. I was not able to find a way to define this kind of distribution in Kibana. Any thoughts on that?

To the best of my knowledge, Elasticsearch can't split up your data this way. You can perform operations on the duration (end time - start time), and use that to visualize information about all the durations (sum, average, mode, etc), that's pretty simple. You could add a duration field using a scripted field, or enrich the documents to add that information as another field... but that's not really what you are looking for.

If you had a known, fixed time interval, you could index your data the way you want to query it. That is, if you knew you always wanted to look at the "current call count" on a pre hour basis, you could take the documents you already have and index them multiple times, one for each hourly block. So, in the case of your example, your document 1:00pm-4:30pm document could be indexed as a number of documents, 1-2pm, 2-3pm, 3-4pm, and 4-4:30pm.

What you need is a way to take a single document and have it be represented in multiple buckets. I don't believe Elasticsearch can do that.

I remember an old thread discussing something similar, but am not sure if/how this would work with the latest release.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.