Unfortunately, it's the nested data type that is causing problems.
The aggregation you wrote is solid, and gets the data summarized in a nice, nested, format. If we look at an order label bucket in Vega debug, it'll look something like:
> VEGA_DEBUG.view.data('source_0')[0]
doc_count: 288
key: "Canada"
orders:
doc_count: 2880
orders_id: buckets: [{...}, {...}, {...}, ...]
doc_count_error_upper_bound: 0
sum_other_doc_count: 2613
Easy to see what's happening there. There's a label of Canada
and it has a bunch of order_id
objects attributed - data buckets for each order_id
. We can check out one of those order_id
objects:
doc_count: 10
key: 636
orders_price:
value: 52881
So there was an order with ID 636
and the price for that order was 52881
When the data is broken down in this nested way, it's easy for a human to understand. Unfortunately, with VegaLite (note your $schema
), data really needs to come in a flattened format. In the Kibana integration with Vega, and querying Elasticsearch to get data for VegaLite, we usually want to use the composite
aggregation - it makes the result data all nice and flat. You'd see this if the body
looks like:
{
"size": 0,
"aggs": {
"table": {
"composite": {
"sources": [
{
"time": {
"date_histogram": {
"field": "timestamp",
"interval": "month"
}
}
},
{
"label": {
"terms": {
"field": "label"
}
}
}
]
}
}
}
}
Composite aggregation is pretty much essential for VegaLite because it expects the data coming in to be flat, where each property of the array of data is a leaf value of data.
Note that I didn't include any of the fields under the nested
type in that example. That's where the problem is: VegaLite syntax doesn't support dotted notation - accessing the nested fields using dots in VegaLite doesn't work:
/* this won't work */
encoding: {
row: {
field: key
type: nominal
}
x: {
field: orders.orders_id.buckets.orders_price.value /* `buckets` is an array - VegaLite doesn't dive into every object of the array to bring back `orders_price.value` */
type: quantitative
axis: {title: "Price"}
}
y: {
field: orders.orders_id.buckets.key /* same problem with`buckets` is an array */
type: quantitative
axis: {title: "Orders"}
}
}
Another problem is that the composite
aggregation only supports the source
aggregations to be bucket aggregations themselves. Nested aggregations require having their own child aggregations, so they don't fit the model of composite
.
{
"size": 0,
"aggs": {
"table": {
"composite": {
"sources": [
{
"time": {
"date_histogram": {
"field": "timestamp",
"interval": "month"
}
}
},
{
"label": {
"terms": {
"field": "label"
}
}
},
{
"orders": {
"nested": {
"path": "orders"
},
"aggs": {
"order_ids": {
"terms": {
"field": "order.id",
"size": 1000
}
}
}
}
}
]
}
}
}
}
# result:
{
"error": {
"root_cause": [
{
"type": "parsing_exception",
"reason": "invalid source type: nested",
"line": 24,
"col": 25
}
],
"type": "parsing_exception",
"reason": "[composite] failed to parse field [sources]",
"line": 24,
"col": 25,
"caused_by": {
"type": "parsing_exception",
"reason": "invalid source type: nested",
"line": 24,
"col": 25
}
},
"status": 400
}
I talked with Yuri Astrakhan on this (I don't think he is on Discuss) and he mentioned it might be possible to use your nested data with Vega as opposed to VegaLite, but that it would be harder. I had to ask him because to be honest - this was my first exploration with Vega! I thank you for the opportunity.
I'll have Yuri follow up to see if he has further ideas, and I'll subscribe to this topic to learn more about it.
Cheers,
-Tim