How to calculate the elapsed time of a flag status per day?

I'd like to figure out the elapsed time between flag status changes.

Simplified example : a person can only be sad or happy. I'd like to know how long each mood was active until it changed.

I'm looking for a general best practice . I'm open to change the way I create my logs. I'm new to data science and probably miss the right terms. Related advices are very appreciated.

Index: user_mood_log

 |    changed_at    |  mood  | user |
 | yyyy-mm-dd 12:00 | sad    | John |
 | yyyy-mm-dd 15:00 | happy  | John |
 | yyyy-mm-dd 18:00 | happy  | John |
 |        ...       |   ...  |  ... |


  1. The last record of yesterday (or even older) needs to be considered. Means I can't simply group by day.
  2. A mood might not change for several days.
  3. Duplicated mood logs are possible.
  4. I could want to change the grouping by hours or months or others on the fly
  5. It could have more than 2 moods
  6. Number of logs are somewhat big

Any advice? Thanks in advance!

I've just found this Transactions | Kibana Guide [7.13] | Elastic

which is exactly the kind of graph I need to do but based on filebeat data.

"transaction" id would be John and I would see time passed in each status.

How can I do that?

A transaction is part of APM (application performance management), where this transaction gets created before it is indexed.

However, in your case, you might be able to keep your data structure and index events as they come in, but then use transforms to move the data into a format suitable for querying further.

Thanks for the answer, I've looked into transforms as well but I can't find a way to calculate elapsed time between two documents when there's no start/stop events but a continuous flow of a wide range of events.

Any ideas on how to do that? Is it only possible?

then it would not be a state change but just the duration between two events that are tied to the same user that need a duration?

I might be a little confused on the wording here but I guess yes.

"State" for me is happy/sad/other/etc.. I need the total duration per day or per other aggregations and for each users.

Did you check this example from the docs?

Otherwise, if the start and stop are explicit named events, you can use a scripted_metric. Another possibility is using filter to write 2 fields and an ingest pipeline on the destination index that calculates the duration from it.

I've checked the example, my problem is that I don't know how to bucket only one start and one stop event. I don't want the duration of the min start (sad) and stop (happy) event of john, I want all durations between each events.

I've looked at ingest pipeline that could add the time of previous state in the log but I think it's duplicating information that will be less stable and less maintainable so I try to avoid doing that.

For now, direction I'm looking at is something like Entity-Centric Indexing - Mark Harwood | Elastic Videos

But it's awfully complicated for the newbie I am.

Ok, got it. In this case you probably need a script, I am not aware of an out of the box function that can do this.

Before further recommending how to script it: Are you working with a static dataset, which means it does not change? Or is this a data stream? (If not now, maybe in future?)

What's the output you are looking for exactly?
Do you want 1 document per change with a duration?
Or e.g. the sum over these durations.

Maybe you can provide 1 or 2 sample documents for the output you are looking for.

This is a future data stream that is going to grow up rapidly. Analytics in future will be more useful the more real-time we get. I hope we can output some actionable data a minute after events gets logged.

example of data needed:

  • I want to see if the total duration John is sad per day is growing or going down.
  • I want to get which states are the most common in a group of users of a particular user.
  • I want the average duration of the state "sad". And if it's going down or up over time.

Best widgets to display that kind of data are in APM but I would need to heavily manipulate data to fit those. APM is not design for that.

That's why I hoped a continuous transform (or a continuous batching python script) with standard visualizations would do the trick. I'm just lacking skills or close examples that would help me.

I'm shocked that there's no out of the box solution for that kind of analytics.

I just realize that this could be done if we can "upsample" the data.

Let's imagine the same example:
| changed_at | mood | user |

| yyyy-mm-dd 12:00 | sad | John |
| yyyy-mm-dd 15:00 | happy | John |
| yyyy-mm-dd 18:00 | happy | John |

Which I upsample to the minute:

| changed_at | mood | user |

| yyyy-mm-dd 12:00 | sad | John |
| yyyy-mm-dd 12:01 | sad | John |
| yyyy-mm-dd 12:02 | sad | John |
| yyyy-mm-dd 14:58 | sad | John |
| yyyy-mm-dd 14:59 | sad | John |
| yyyy-mm-dd 15:00 | happy | John |
| yyyy-mm-dd 15:01 | happy | John |
| yyyy-mm-dd 15:02 | happy | John |

Then I could aggregate that data with a count of records and I would have a number on minute in that state.

The aggregation would be pretty simple, is there a way to upsample data like pandas.resample would do in a transform?

Thank you

I am not aware of up-sampling in elasticsearch. What strikes me in this use case: It is possible to calculate this with a script, however every new data point would cause a full recalculation, there fore I think this won't scale. It would help a lot, if you create daily buckets, but if the 1st datapoint of a day looks like:

| yyyy-mm-dd 00:24 | sad | John |

it's not possible to know if "John" was "sad" or "happy" for the first 24 minutes of the day. For the right side of the window you could assume the last state persists for the rest of the day. If you get an update you would recalculate anyway.

It's not possible to access the value of the last time bucket. The only option I see is custom code.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.