Kibana Vega - Multiple Buckets visualization - how?

Unfortunately, it's the nested data type that is causing problems.

The aggregation you wrote is solid, and gets the data summarized in a nice, nested, format. If we look at an order label bucket in Vega debug, it'll look something like:

> VEGA_DEBUG.view.data('source_0')[0]
doc_count: 288
key: "Canada"
orders:
  doc_count: 2880
  orders_id: buckets: [{...}, {...}, {...}, ...]
  doc_count_error_upper_bound: 0
  sum_other_doc_count: 2613

Easy to see what's happening there. There's a label of Canada and it has a bunch of order_id objects attributed - data buckets for each order_id. We can check out one of those order_id objects:

doc_count: 10
key: 636
orders_price:
  value: 52881

So there was an order with ID 636 and the price for that order was 52881

When the data is broken down in this nested way, it's easy for a human to understand. Unfortunately, with VegaLite (note your $schema), data really needs to come in a flattened format. In the Kibana integration with Vega, and querying Elasticsearch to get data for VegaLite, we usually want to use the composite aggregation - it makes the result data all nice and flat. You'd see this if the body looks like:

{
  "size": 0,
  "aggs": {
    "table": {
      "composite": {
        "sources": [
          {
            "time": {
              "date_histogram": {
                "field": "timestamp",
                "interval": "month"
              }
            }
          },
          {
            "label": {
              "terms": {
                "field": "label"
              }
            }
          }
        ]
      }
    }
  }
}

Composite aggregation is pretty much essential for VegaLite because it expects the data coming in to be flat, where each property of the array of data is a leaf value of data.

Note that I didn't include any of the fields under the nested type in that example. That's where the problem is: VegaLite syntax doesn't support dotted notation - accessing the nested fields using dots in VegaLite doesn't work:

 /* this won't work */
  encoding: {
    row: {
      field: key
      type: nominal
    }
    x: {
      field: orders.orders_id.buckets.orders_price.value /* `buckets` is an array - VegaLite doesn't dive into every object of the array to bring back `orders_price.value` */
      type: quantitative
      axis: {title: "Price"}
    }
    y: {
      field: orders.orders_id.buckets.key /* same problem with`buckets` is an array */
      type: quantitative
      axis: {title: "Orders"}
    }
  }

Another problem is that the composite aggregation only supports the source aggregations to be bucket aggregations themselves. Nested aggregations require having their own child aggregations, so they don't fit the model of composite.

{
  "size": 0,
  "aggs": {
    "table": {
      "composite": {
        "sources": [
          {
            "time": {
              "date_histogram": {
                "field": "timestamp",
                "interval": "month"
              }
            }
          },
          {
            "label": {
              "terms": {
                "field": "label"
              }
            }
          },
          {
            "orders": {
              "nested": {
                "path": "orders"
              },
              "aggs": {
                "order_ids": {
                  "terms": {
                    "field": "order.id",
                    "size": 1000
                  }
                }
              }
            }
          }
        ]
      }
    }
  }
}

# result:
{
  "error": {
    "root_cause": [
      {
        "type": "parsing_exception",
        "reason": "invalid source type: nested",
        "line": 24,
        "col": 25
      }
    ],
    "type": "parsing_exception",
    "reason": "[composite] failed to parse field [sources]",
    "line": 24,
    "col": 25,
    "caused_by": {
      "type": "parsing_exception",
      "reason": "invalid source type: nested",
      "line": 24,
      "col": 25
    }
  },
  "status": 400
}

I talked with Yuri Astrakhan on this (I don't think he is on Discuss) and he mentioned it might be possible to use your nested data with Vega as opposed to VegaLite, but that it would be harder. I had to ask him because to be honest - this was my first exploration with Vega! I thank you for the opportunity.

I'll have Yuri follow up to see if he has further ideas, and I'll subscribe to this topic to learn more about it.

Cheers,
-Tim

1 Like