Calculate percentage based on status of max per group in Elasticsearch

Given the dataset below, I'd like to calculate percentage of status over unique count of workflow.

id,workflow,status
1,A,FAILURE
2,A,ABORTED
3,A,SUCCESS
4,A,SUCCESS
1,B,FAILURE
2,B,SUCCESS
3,B,FAILURE
1,C,FAILURE
2,C,FAILURE
1,D,FAILURE
2,D,SUCCESS
3,D,FAILURE
4,D,FAILURE
5,D,FAILURE

Always refer to the status of max id.

A - SUCCESS (refer to max id is 4)
B - FAILURE (refer to max id is 3)
C - FAILURE (refer max id is 2)
D - FAILURE (refer max id is 5)

Expected aggregated results as follow:

unique count of workflow: 4   // workflow A, B, C, and D
SUCCESS: (1/4) * 100 = 25%    // workflow A
FAILURE: (3/4) * 100 = 75%    // workflow B, C, and D

Appreciate it if you can advise the query to get the 25% (SUCCESS) and 75% (FAILURE).

Hi @stephenb, appreciate it if you can provide your assistance to my problem statement.

Hi @stephenb Is it possible to come out with a query without additional of runtime field(s)?

Hi @carollyl

I suggest that if you are going to many of these types of queries you spend some time with the DSL and learn the aggregations.

Whether you need a runtime field or not is unclear to me.

I also suggested that you look at transforms as it may make the data easier to work with, that may be worth your time.

There are many questions and few helpers... perhaps I can take a look at this later in the week but I may not be able to.

I managed to create the rate (success, failure, aborted) with runtime_mappings. Please share with me if there is a more optimized approach.

Query

GET discuss/_search
{
  "size": 0,
  "runtime_mappings": {
    "status_success": {
      "type": "long",
      "script": {
        "source": "if (doc['status'].value == 'SUCCESS') {emit(1)} else {emit(0)}"
      }
    },
    "status_failure": {
      "type": "long",
      "script": {
        "source": "if (doc['status'].value == 'FAILURE' || doc['status'].value == 'UNSTABLE') {emit(1)} else {emit(0)}"
      }
    },
    "status_aborted": {
      "type": "long",
      "script": {
        "source": "if (doc['status'].value == 'ABORTED') {emit(1)} else {emit(0)}"
      }
    }
  },
  "aggs": {
    "by_workflow": {
      "terms": {
        "field": "workflow"
      },
      "aggs": {
        "top_status": {
          "top_metrics": {
            "metrics": [
              {"field": "status_success"},
              {"field": "status_failure"},
              {"field": "status_aborted"}
            ],
            "sort": {
              "id": "desc"
            }
          }
        }
      }
    },
    "success_rate": {
      "avg_bucket": {
        "buckets_path": "by_workflow>top_status[status_success]",
        "gap_policy": "skip",
        "format": "#%"
      }
    },
    "failure_rate": {
      "avg_bucket": {
        "buckets_path": "by_workflow>top_status[status_failure]",
        "gap_policy": "skip",
        "format": "#%"
      }
    }
    "aborted_rate": {
      "avg_bucket": {
        "buckets_path": "by_workflow>top_status[status_aborted]",
        "gap_policy": "skip",
        "format": "#%"
      }
    }
  }

Results

{
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 421,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "by_workflow" : {
      "doc_count_error_upper_bound" : 2,
      "sum_other_doc_count" : 179,
      "buckets" : [
        {
          "key" : "A",
          "doc_count" : 102,
          "top_status" : {
            "top" : [
              {
                "sort" : [
                  23
                ],
                "metrics" : {
                  "status_success" : 0,
                  "status_failure" : 1,
                  "status_aborted" : 0
                }
              }
            ]
          }
        },
        {
          "key" : "B",
          "doc_count" : 78,
          "top_status" : {
            "top" : [
              {
                "sort" : [
                  47
                ],
                "metrics" : {
                  "status_success" : 1,
                  "status_failure" : 0,
                  "status_aborted" : 0
                }
              }
            ]
          }
        },
        {
          "key" : "C",
          "doc_count" : 62,
          "top_status" : {
            "top" : [
              {
                "sort" : [
                  57
                ],
                "metrics" : {
                  "status_success" : 1,
                  "status_failure" : 0,
                  "status_aborted" : 0
                }
              }
            ]
          }
        }
      ]
    },
    "success_rate" : {
      "value" : 0.6666666666666666,
      "value_as_string" : "67%"
    },
    "failure_rate" : {
      "value" : 0.3333333333333333,
      "value_as_string" : "33%"
    },
    "aborted_rate" : {
      "value" : 0.0,
      "value_as_string" : "0%"
    }
  }
}

I'd like to create a pie chart (vega/vega-lite) using the success_rate, failure_rate, and aborted_rate. Appreciate it if you can share any insights how to transform the rate into accessible array of data.

Hi @carollyl Great progress / Nicely Done, unfortunately, I do not work with Vega. You should open a topic with Vega in the title and see if someone can help, it is a very specialized area.

I actually think that you could add those runtime fields to the mapping (as opposed to the query)
Then you could use a transform
Then you could probably just use a normal lens with some formulas to get the correct results.

I would look at that as much as vega... But that would be my approach as I don't know Vega.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.