Accessing Subbuckets in Vega

tadgh · June 27, 2018, 10:14pm

Hello. I am just here to mirror this question (Format for multi-level aggregation in Vega) which is: How do you access an array of subbuckets in Vega/Vegalite via Kibana?
I have a query which is aggregated over a given date range. This creates an array of buckets. In each of these buckets, I have a terms aggregation (over a field which has only 2 possibilties) which generates another set of buckets, within which I perform some extended_stats. Here is what my response looks like:

{
  "took": 43,
  "timed_out": false,
  "_shards": {
"total": 145,
"successful": 145,
"skipped": 0,
"failed": 0
  },
  "hits": {
"total": 206,
"max_score": 0,
"hits": []
  },
  "aggregations": {
"time_buckets": {
  "buckets": [
    {
      "key": 0,
      "doc_count": 6,
      "study_allocation": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
          {
            "key": 0,
            "key_as_string": "false",
            "doc_count": 4,
            "pain_stats": {
              "count": 4,
              "min": 3,
              "max": 3,
              "avg": 3,
              "sum": 12,
              "sum_of_squares": 36,
              "variance": 0,
              "std_deviation": 0,
              "std_deviation_bounds": {
                "upper": 3,
                "lower": 3
              }
            }
          },
          {
            "key": 1,
            "key_as_string": "true",
            "doc_count": 2,
            "pain_stats": {
              "count": 2,
              "min": 3,
              "max": 3,
              "avg": 3,
              "sum": 6,
              "sum_of_squares": 18,
              "variance": 0,
              "std_deviation": 0,
              "std_deviation_bounds": {
                "upper": 3,
                "lower": 3
              }
            }
          }
        ]
      }
    },
    {
      "key": 1,
      "doc_count": 3,
      "study_allocation": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
          {
            "key": 0,
            "key_as_string": "false",
            "doc_count": 2,
            "pain_stats": {
              "count": 2,
              "min": 2,
              "max": 3,
              "avg": 2.5,
              "sum": 5,
              "sum_of_squares": 13,
              "variance": 0.25,
              "std_deviation": 0.5,
              "std_deviation_bounds": {
                "upper": 3.5,
                "lower": 1.5
              }
            }
          },
          {
            "key": 1,
            "key_as_string": "true",
            "doc_count": 1,
            "pain_stats": {
              "count": 1,
              "min": 4,
              "max": 4,
              "avg": 4,
              "sum": 4,
              "sum_of_squares": 16,
              "variance": 0,
              "std_deviation": 0,
              "std_deviation_bounds": {
                "upper": 4,
                "lower": 4
              }
            }
          }
        ]
      }
    },
    {
      "key": 2,
      "doc_count": 3,
      "study_allocation": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
          {
            "key": 0,
            "key_as_string": "false",
            "doc_count": 3,
            "pain_stats": {
              "count": 3,
              "min": 2,
              "max": 4,
              "avg": 3,
              "sum": 9,
              "sum_of_squares": 29,
              "variance": 0.6666666666666666,
              "std_deviation": 0.816496580927726,
              "std_deviation_bounds": {
                "upper": 4.6329931618554525,
                "lower": 1.367006838144548
              }
            }
          }
        ]
      }
    },
    {
      "key": 3,
      "doc_count": 4,
      "study_allocation": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
          {
            "key": 0,
            "key_as_string": "false",
            "doc_count": 3,
            "pain_stats": {
              "count": 3,
              "min": 3,
              "max": 5,
              "avg": 3.6666666666666665,
              "sum": 11,
              "sum_of_squares": 43,
              "variance": 0.8888888888888881,
              "std_deviation": 0.9428090415820629,
              "std_deviation_bounds": {
                "upper": 5.552284749830792,
                "lower": 1.7810485835025407
              }
            }
          },
          {
            "key": 1,
            "key_as_string": "true",
            "doc_count": 1,
            "pain_stats": {
              "count": 1,
              "min": 3,
              "max": 3,
              "avg": 3,
              "sum": 3,
              "sum_of_squares": 9,
              "variance": 0,
              "std_deviation": 0,
              "std_deviation_bounds": {
                "upper": 3,
                "lower": 3
              }
            }
          }
        ]
      }
    },
...etc

In vega-lite, i want to split the line chart into separate lines, one for each of the buckets in study_allocation. I see no way in vega by which I can access array elements in an expression. For example, if I want to do a transform based on the extended_stats contained in the first bucket of the study_allocation, there seems to be no way to do that. As the linked post indicates, it says that the buckets are undefined. e.g. I want to do datum.study_allocation.buckets[0].pain_stats.avg in a transform, but I am unable to. If I just want to do stats over the original set of buckets without the sub-aggregation, this works no problem.

Any advice? Am I structuring my query the wrong way? Can I Just make two separate queries and overlay them onto the same chart? Not sure the best way to go about this.

Cheers,

-G

jen-huang · June 27, 2018, 11:19pm

Check out Yuri's suggestions in this thread, as well as his note about flatten transform further down:

nyuriks · June 27, 2018, 11:34pm

Also, yes, you can have multiple data sources on the same graph (see layers in vega lite, and data sources in vega). You might get some inconsistencies between multiple query results, as data may change a bit, plus it increases the load on the server, so if possible, you should try to draw everything from a single result.

tadgh · June 28, 2018, 12:21am

Thanks so much for both of your suggestions, I will attempt a few things and report back for posterity.

tadgh · July 25, 2018, 4:21pm

Ended up just doing multiple data sources in Vega itself, and plotthing them separetely onto the same chart. Worked like a charm

system · August 22, 2018, 4:21pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Format for multi-level aggregation in Vega Kibana	3	2310	October 15, 2018
Vega: Access nested fields of a JSON file Kibana vega	10	1525	January 7, 2021
How to access data from Elasticsearch in VEGA with nested aggregations Kibana	2	678	September 10, 2020
Sub objects vega lite Kibana	4	1271	July 6, 2018
Unable to access nested fields in vegalite Kibana vega	2	269	February 9, 2023

Accessing Subbuckets in Vega

Related topics