Time series graph that summarizes the index state for each time point

Hello,

I'm currently trying to solve the following problem:

We have users that are active from the date when they start a certain action to the date when they complete it. We need to display a graph showing the number of active users through time (a time series graph for day/month is preferred).

A user is active from the time when we receive a log of his start action to the date when we receive a log that he completed the action.

  • Start log (contains status = "Activated" and start_date)
  • End log (contains status = "Completed" and completed_date)

For each time point, the graph would need to take into consideration all documents up to that date and then iterate further in time taking into consideration new events that were added.

I managed to create a custom aggregation that works for a specific date and can show me all the active users on that specific date (using Vega and only showing a number). But I am wondering how to plot these values in a time graph. It would be pretty easy to use a script that daily/monthly creates a new document that summarizes the situation at that point in time but I am wondering if there is a Kibana solution that could do this task. From my current knowledge transforms would only allow me to create a summary document from today forward, and not for dates in the past.

Thank you for your help.

We need to display a graph showing the number of active users through time (a time series graph for day/month is preferred).

hard to answer this exactly without data examples. I believe you do want to "Latest" transform the data into a summary index around the entity in question (some user identifier "uid123" as an example). when the data is incomplete you will only have values for the "start_date" in your example. Once the log entry for "uid123" comes in with a completed date you'd have a single document with both start and end dates. You could then do a runtime / mapped field between the two dates to generate the number of milliseconds the user was in the system and as long as you have the start date or the timestamp of the log you could analyze these days by either date in most visualization tools in Kibana...and even do things like look at average session times across users....count unique user id's, etc.

We need to display a graph showing the number of active users through time (a time series graph for day/month is preferred).

If what I say above is right, to count "active" I believe you'd just do a filtered unique count of documents where "completed_date" is null. You can get the inverse to see only completed or look at totals.

I can't think of a way to do this without some data modeling up front but may be wrong or not fully understand your question! Hope this helps

Thank you for your answer. What you suggested is very similar to what I'm doing currently but unfortunately, it is not sufficient.

Let me elaborate. Below are some simplified examples of my logs. User is considered "Active" after we receive a "state": "Activated" log and until we receive a "state": "Completed" log.

{"id": "uid1", "date": "01.01.2022", "state": "Activated"},
{"id": "uid2", "date": "01.01.2022", "state": "Activated"},

{"id": "uid3", "date": "02.01.2022", "state": "Activated"},

{"id": "uid1", "date": "03.01.2022", "state": "Completed"},

{"id": "uid2", "date": "04.01.2022", "state": "Completed"}

I already have an aggregation metric that I am displaying using Vega plot and would show me the following number of "Active" users on these days:

  • 01.01.2022 - number of active users = 2
  • 02.01.2022 - number of active users = 3
  • 03.01.2022 - number of active users = 2 (one user completed his action)
  • 04.01.2022 - number of active users = 1 (another user completed his action)

Now I need a TSVB plot or histogram that would display these values in time (2, 3, 2, 1 values corresponding to the dates). By creating a new entity-centric index (with start, stop date etc.) that doesn't solve the problem that the user should be counted as "Active" even on days when there were no logs received from him.

One of the solutions I see is to use an external script and iterate through all dates by considering all logs up until that point and then creating summary documents. This would create the following documents:

{"date": "01.01.2022", "active_patients": 2},
{"date": "02.01.2022", "active_patients": 3},
{"date": "03.01.2022", "active_patients": 2},
{"date": "04.01.2022", "active_patients": 1}

Creating a TSVB plot with these documents should be straightforward. Although, this seems like a very common task that should be possible to do internally in Elasticsearch. I am playing around with transforms but I cannot seem to get the desired results. The aggregation I already have is promising but it would have to be run for each date and then those results displayed which seems quite resource-intensive and probably not possible. Do you have any recommendations on how to tackle this problem? It seems like a common problem when dealing with statuses that are considered "active" until another log "closes" them.

Hi,

it was a bit challenging for me, but I managed to get the active_patients by Elasticsearch aggregation without external script. You may be able to use it in vega-lite custom visualization. I'm not sure this aggregation is compatible with transform.

POST /test_active_user/_search
{
  "aggs":{
    "date_hist":{
      "date_histogram": {
        "field": "date",
        "calendar_interval": "month"
      },
      "aggs":{
        "state":{
          "terms": {
            "field": "state",
            "include":["Activated","Completed"],
            "min_doc_count": 0, 
            "size": 10
          }
        },
        "increased_patients":{
          "bucket_script": {
            "buckets_path": {
              "in": "state['Activated']>_count",
              "out": "state['Completed']>_count"
            },
            "script": "params.in - params.out"
          }
        },
        "active_patients":{
          "cumulative_sum": {
            "buckets_path": "increased_patients"
          }
        }
      }
    }
  },
  "size":0
}
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 5,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "date_hist" : {
      "buckets" : [
        {
          "key_as_string" : "01.01.2022",
          "key" : 1640995200000,
          "doc_count" : 2,
          "state" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "Activated",
                "doc_count" : 2
              },
              {
                "key" : "Completed",
                "doc_count" : 0
              }
            ]
          },
          "increased_patients" : {
            "value" : 2.0
          },
          "active_patients" : {
            "value" : 2.0
          }
        },
        {
          "key_as_string" : "02.01.2022",
          "key" : 1643673600000,
          "doc_count" : 1,
          "state" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "Activated",
                "doc_count" : 1
              },
              {
                "key" : "Completed",
                "doc_count" : 0
              }
            ]
          },
          "increased_patients" : {
            "value" : 1.0
          },
          "active_patients" : {
            "value" : 3.0
          }
        },
        {
          "key_as_string" : "03.01.2022",
          "key" : 1646092800000,
          "doc_count" : 1,
          "state" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "Completed",
                "doc_count" : 1
              },
              {
                "key" : "Activated",
                "doc_count" : 0
              }
            ]
          },
          "increased_patients" : {
            "value" : -1.0
          },
          "active_patients" : {
            "value" : 2.0
          }
        },
        {
          "key_as_string" : "04.01.2022",
          "key" : 1648771200000,
          "doc_count" : 1,
          "state" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "Completed",
                "doc_count" : 1
              },
              {
                "key" : "Activated",
                "doc_count" : 0
              }
            ]
          },
          "increased_patients" : {
            "value" : -1.0
          },
          "active_patients" : {
            "value" : 1.0
          }
        }
      ]
    }
  }
}
1 Like

Works like a charm (also managed to use it in a vega-lite visualization). Thank you for this simple but very effective solution.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.