Accessing Subbuckets in Vega

Hello. I am just here to mirror this question (Format for multi-level aggregation in Vega) which is: How do you access an array of subbuckets in Vega/Vegalite via Kibana?
I have a query which is aggregated over a given date range. This creates an array of buckets. In each of these buckets, I have a terms aggregation (over a field which has only 2 possibilties) which generates another set of buckets, within which I perform some extended_stats. Here is what my response looks like:

{
  "took": 43,
  "timed_out": false,
  "_shards": {
"total": 145,
"successful": 145,
"skipped": 0,
"failed": 0
  },
  "hits": {
"total": 206,
"max_score": 0,
"hits": []
  },
  "aggregations": {
"time_buckets": {
  "buckets": [
    {
      "key": 0,
      "doc_count": 6,
      "study_allocation": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
          {
            "key": 0,
            "key_as_string": "false",
            "doc_count": 4,
            "pain_stats": {
              "count": 4,
              "min": 3,
              "max": 3,
              "avg": 3,
              "sum": 12,
              "sum_of_squares": 36,
              "variance": 0,
              "std_deviation": 0,
              "std_deviation_bounds": {
                "upper": 3,
                "lower": 3
              }
            }
          },
          {
            "key": 1,
            "key_as_string": "true",
            "doc_count": 2,
            "pain_stats": {
              "count": 2,
              "min": 3,
              "max": 3,
              "avg": 3,
              "sum": 6,
              "sum_of_squares": 18,
              "variance": 0,
              "std_deviation": 0,
              "std_deviation_bounds": {
                "upper": 3,
                "lower": 3
              }
            }
          }
        ]
      }
    },
    {
      "key": 1,
      "doc_count": 3,
      "study_allocation": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
          {
            "key": 0,
            "key_as_string": "false",
            "doc_count": 2,
            "pain_stats": {
              "count": 2,
              "min": 2,
              "max": 3,
              "avg": 2.5,
              "sum": 5,
              "sum_of_squares": 13,
              "variance": 0.25,
              "std_deviation": 0.5,
              "std_deviation_bounds": {
                "upper": 3.5,
                "lower": 1.5
              }
            }
          },
          {
            "key": 1,
            "key_as_string": "true",
            "doc_count": 1,
            "pain_stats": {
              "count": 1,
              "min": 4,
              "max": 4,
              "avg": 4,
              "sum": 4,
              "sum_of_squares": 16,
              "variance": 0,
              "std_deviation": 0,
              "std_deviation_bounds": {
                "upper": 4,
                "lower": 4
              }
            }
          }
        ]
      }
    },
    {
      "key": 2,
      "doc_count": 3,
      "study_allocation": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
          {
            "key": 0,
            "key_as_string": "false",
            "doc_count": 3,
            "pain_stats": {
              "count": 3,
              "min": 2,
              "max": 4,
              "avg": 3,
              "sum": 9,
              "sum_of_squares": 29,
              "variance": 0.6666666666666666,
              "std_deviation": 0.816496580927726,
              "std_deviation_bounds": {
                "upper": 4.6329931618554525,
                "lower": 1.367006838144548
              }
            }
          }
        ]
      }
    },
    {
      "key": 3,
      "doc_count": 4,
      "study_allocation": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
          {
            "key": 0,
            "key_as_string": "false",
            "doc_count": 3,
            "pain_stats": {
              "count": 3,
              "min": 3,
              "max": 5,
              "avg": 3.6666666666666665,
              "sum": 11,
              "sum_of_squares": 43,
              "variance": 0.8888888888888881,
              "std_deviation": 0.9428090415820629,
              "std_deviation_bounds": {
                "upper": 5.552284749830792,
                "lower": 1.7810485835025407
              }
            }
          },
          {
            "key": 1,
            "key_as_string": "true",
            "doc_count": 1,
            "pain_stats": {
              "count": 1,
              "min": 3,
              "max": 3,
              "avg": 3,
              "sum": 3,
              "sum_of_squares": 9,
              "variance": 0,
              "std_deviation": 0,
              "std_deviation_bounds": {
                "upper": 3,
                "lower": 3
              }
            }
          }
        ]
      }
    },
...etc

In vega-lite, i want to split the line chart into separate lines, one for each of the buckets in study_allocation. I see no way in vega by which I can access array elements in an expression. For example, if I want to do a transform based on the extended_stats contained in the first bucket of the study_allocation, there seems to be no way to do that. As the linked post indicates, it says that the buckets are undefined. e.g. I want to do datum.study_allocation.buckets[0].pain_stats.avg in a transform, but I am unable to. If I just want to do stats over the original set of buckets without the sub-aggregation, this works no problem.

Any advice? Am I structuring my query the wrong way? Can I Just make two separate queries and overlay them onto the same chart? Not sure the best way to go about this.

Cheers,

-G

2 Likes

Check out Yuri's suggestions in this thread, as well as his note about flatten transform further down:

Also, yes, you can have multiple data sources on the same graph (see layers in vega lite, and data sources in vega). You might get some inconsistencies between multiple query results, as data may change a bit, plus it increases the load on the server, so if possible, you should try to draw everything from a single result.

Thanks so much for both of your suggestions, I will attempt a few things and report back for posterity.

Ended up just doing multiple data sources in Vega itself, and plotthing them separetely onto the same chart. Worked like a charm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.