Visualize documents that have multiple versions

Hello community!

I am trying to build visualizations in Kibana for documents that have multiple revisions. Just in case, I know that Elasticsearch does not keep the previous versions of the documents, however, in this topic I hope that we can find the most optimal way to deal with it

So, let's assume I have these documents:

POST subscription/_doc
{
  "subscription_id": "111",
  "subscription_status": 0,
  "date": "2023-11-01T00:00:00"
}

POST subscription/_doc
{
  "subscription_id": "222",
  "subscription_status": 0,
  "date": "2023-11-01T00:00:00"
}

POST subscription/_doc
{
  "subscription_id": "111",
  "subscription_status": 1,
  "date": "2023-11-05T00:00:00"
}

POST subscription/_doc
{
  "subscription_id": "222",
  "subscription_status": 1,
  "date": "2023-11-10T00:00:00"
}

POST subscription/_doc
{
  "subscription_id": "333",
  "subscription_status": 1,
  "date": "2023-11-15T00:00:00"
}

POST subscription/_doc
{
  "subscription_id": "444",
  "subscription_status": 0,
  "date": "2023-11-20T00:00:00"
}

POST subscription/_doc
{
  "subscription_id": "444",
  "subscription_status": 0,
  "date": "2023-11-20T00:00:00"
}

As you can see, subscription_id is unique but for every subscription_id the could be multiple documents. Also, every subscription could be either active ("subscription_status" = "1") or inactive ("subscription_status" = "0") and its status may change back and forth anytime

I want to build a Kibana time series graph that shows the total number of active ("subscription_status" = "1") and inactive ("subscription_status" = "0") subscriptions, like below:

Nov 01: Inactive = 2, Active = 0 <= initial state: we have 2 inactive subs
Nov 02: Inactive = 2, Active = 0
Nov 03: Inactive = 2, Active = 0
Nov 04: Inactive = 2, Active = 0
Nov 05: Inactive = 1, Active = 1 <= sub "111" has been changed to active
Nov 06: Inactive = 1, Active = 1
Nov 07: Inactive = 1, Active = 1
Nov 08: Inactive = 1, Active = 1
Nov 09: Inactive = 1, Active = 1
Nov 10: Inactive = 0, Active = 2 <= sub "222" has been changed to active
Nov 11: Inactive = 0, Active = 2
Nov 12: Inactive = 0, Active = 2
Nov 13: Inactive = 0, Active = 2
Nov 14: Inactive = 0, Active = 2
Nov 15: Inactive = 0, Active = 3 <= a new active sub was added
Nov 16: Inactive = 0, Active = 3
Nov 17: Inactive = 0, Active = 3
Nov 18: Inactive = 0, Active = 3
Nov 19: Inactive = 0, Active = 3
Nov 20: Inactive = 1, Active = 3 <= a new inactive sub was added

However, while building it via Kibana I cannot add logic "for the specific day in case of multiple documents for the same subscription take one that is older or equal to this specific day"

So, I switched to Vega as via it I should be able to build something like this

I was able to build an aggregation that correctly calculates the number of subscriptions for the specific day:

{
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "date": {
              "gte": "2023-11-01T00:00:00.000Z",
              "lt": "2023-11-21T00:00:00.000Z"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "subscriptions_agg": {
      "terms": {
        "field": "subscription_id.keyword",
        "order": {
          "max_date": "asc"
        }
      },
      "aggs": {
        "latest_subscription_state_active": {
          "top_metrics": {
		    "metrics": [
		      {"field": "subscription_id.keyword"},
		      {"field": "subscription_status"},
		      {"field": "date"}
            ],
            "sort": [
              {
                "date": {
                  "order": "desc"
                }
              }
            ],
            "size": 1
          }
        },
        "max_date": {
          "max": {
            "field": "date"
          }
        },
        "number_active_subscriptions": {
          "bucket_selector": {
            "buckets_path": {
              "status": "latest_subscription_state_active.subscription_status"
            },
            "script": "params.status == 1"
          }
        }
      }
    },
    "active_subscriptions_count": {
      "stats_bucket": {
        "buckets_path": "subscriptions_agg._count"
      }
    }
  }
}

and the same for inactive when "script": "params.status == 0" (did not find a way how to combine them in the scope of one aggregation)

But I am still uncertain how to add a date histogram above all of it. I tried this way:

{
  "size": 0,
  "aggs": {
    "active_subscriptions_per_day": {
      "date_histogram": {
        "field": "date",
        "calendar_interval": "day"
      },
      "aggs": {
        "subscriptions_agg": {
          "terms": {
            "field": "subscription_id.keyword",
            "order": {
              "max_date": "asc"
            }
          },
          "aggs": {
            "latest_subscription_state_active": {
              "top_metrics": {
                "metrics": [
                  {
                    "field": "subscription_id.keyword"
                  },
                  {
                    "field": "subscription_status"
                  },
                  {
                    "field": "date"
                  }
                ],
                "sort": [
                  {
                    "date": {
                      "order": "desc"
                    }
                  }
                ],
                "size": 1
              }
            },
            "max_date": {
              "max": {
                "field": "date"
              }
            },
            "number_active_subscriptions": {
              "bucket_selector": {
                "buckets_path": {
                  "status": "latest_subscription_state_active.subscription_status"
                },
                "script": "params.status == 1"
              }
            }
          }
        },
        "active_subscriptions_count": {
          "stats_bucket": {
            "buckets_path": "subscriptions_agg._count"
          }
        }
      }
    }
  }
}

however, it does not return what I expect

So, is it possible to build such interesting aggregation or it is better to perform this aggregation outside Elastic, calculate necessary metrics, push the result back to Elastic and use this data to build visualizations via Kibana?

Important note: I can change the source date. For example, add and remove fields, change their types, rename, etc, because I send them to Elastic via Logstash

Any ideas and solutions are welcome

Thank you!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.