Kibana Vega - Multiple Buckets visualization - how?


(Vanna) #1

I can't figure out how to deal with 2 buckets to draw a multi-view graph in Vega for Kibana.

My index in Kibana contains orderbooks that aggregates orders per label bucket (e.g: USD/CAD).

How do I build a VEGA chart in Kibana that works with 2 buckets?

This is my Index Mappings

{
      "orderindex": {
        "aliases": {},
        "mappings": {
          "orderbook": {
            "properties": {
              "exchange": {
                "type": "keyword"
              },
              "id": {
                "type": "integer"
              },
              "label": {
                "type": "keyword"
              },
              "length": {
                "type": "integer"
              },
              "orders": {
                "type": "nested",
                "properties": {
                  "exchange": {
                    "type": "keyword"
                  },
                  "id": {
                    "type": "integer"
                  },
                  "label": {
                    "type": "keyword"
                  },
                  "orderbookId": {
                    "type": "integer"
                  },
                  "ordertype": {
                    "type": "keyword"
                  },
                  "pair1": {
                    "type": "keyword"
                  },
                  "pair2": {
                    "type": "keyword"
                  },
                  "price": {
                    "type": "double"
                  },
                  "quantity": {
                    "type": "double"
                  },
                  "timestamp": {
                    "type": "text",
                    "fields": {
                      "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                      }
                    }
                  },
                  "total": {
                    "type": "double"
                  }
                }
              },
              "timestamp": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              }
            }
          }
        },
        "settings": {
          "index": {
            "creation_date": "1526318629953",
            "number_of_shards": "5",
            "number_of_replicas": "1",
            "uuid": "7AeKJAdlQzSPf9WajEUlxA",
            "version": {
              "created": "6020499"
            },
            "provided_name": "orderindex"
          }
        }
      }
    }

This is how I try to use Vega to draw multi-view graphs

Vega query

    {
  "$schema": "https://vega.github.io/schema/vega-lite/v2.json",
  "title": "Order Totals vs Order Ids",
  "data": {
    "url": {
      "index": "orderindex",
      "body": {
          "size": 0,
          "_source": {
            "excludes": []
          },
          "aggs": {
            "order_labels": {
              "terms": {
                "field": "label",
                "size": 12,
                "order": {
                  "_count": "desc"
                }
              },
              "aggs": {
                "orders": {
                  "nested": {
                    "path": "orders"
                  },
                  "aggs": {
                    "orders_id": {
                      "terms": {
                        "field": "orders.id",
                        "size": 40,
                        "order": {
                          "_count": "desc"
                        }
                      },
                      "aggs": {
                        "orders_price": {
                          "sum": {
                            "field": "orders.price"
                          }
                        }
                      }
                    }
                  }
                }
              }
            }
          },
          "stored_fields": ["*"],
          "script_fields": {},
          "docvalue_fields": [],
          "query": {
            "bool": {
              "must": [
                {
                  "match_all": {}
                }
              ],
              "filter": [],
              "should": [],
              "must_not": []
            }
          }
        }
        },
        "format": {
          "property": "aggregations.order_labels.buckets.orders.orders_id.buckets.orders_price"
        }
      },
    "mark": "bar",
    "encoding": {
    "row": {
      "field": "order_labels", "type": "ordinal"
    },
    "x": {
      "field": "orders_price.value", "type": "quantitative",
      "scale": {"zero": false}
    },
    "y": {
      "field": "orders_id", "type": "ordinal",
      "sort": {"field": "orders_price.value","op": "sum", "order": "descending"},
      "scale": {"rangeStep": 40}
    }
  }
}

My intention with this query is to result in something similar to the multi view graphs draft picture below based on Vega web site multi view graph example for Kibana

enter image description here

The aim is to display the sum of orders' price per bucket label as in the sample picture above.

The problem is that I am receiving an error from the vega query above that says Cannot read property 'orders_id' of undefined. I believe this is because I have multiple buckets aggregation in my query response and I don`t know how to deal with this.

How should I format my property here below in this part of Vega Query to make it work? Or, what should I do to get this 2 buckets multi aggregation query to work with multi-view charts in Vega?

"format": {
          "property": "aggregations.order_labels.buckets.orders.orders_id.buckets.orders_price"
          }

(Vanna) #2

@nyuriks I was reading one of your examples while trying to solve the problem above. After reading it I modified my query above accordingly. My chart now appears to work however values are not displayed.

Would you recommend any changes in the query below when working with multiple buckets in Vega?

{
  "$schema": "https://vega.github.io/schema/vega-lite/v2.json",
  "title": "Order Totals vs Order Ids",
  "data": {
"url": {
  "index": "orderindex",
  "body": {
      "size": 0,
      "_source": {
        "excludes": []
      },
      "aggs": {
        "order_labels": {
          "terms": {
            "field": "label",
            "size": 140,
            "order": {
              "_count": "desc"
            }
          },
          "aggs": {
            "orders": {
              "nested": {
                "path": "orders"
              },
              "aggs": {
                "orders_id": {
                  "terms": {
                    "field": "orders.id",
                    "size": 40,
                    "order": {
                      "_count": "desc"
                    }
                  },
                  "aggs": {
                    "orders_price": {
                      "sum": {
                        "field": "orders.price"
                      }
                    }
                  }
                }
              }
            }
          }
        }
      },
      "stored_fields": ["*"],
      "script_fields": {},
      "docvalue_fields": [],
      "query": {
        "bool": {
          "must": [
            {
              "match_all": {}
            }
          ],
          "filter": [],
          "should": [],
          "must_not": []
        }
      }
    }
    },
    "format": {
      "property": "aggregations.order_labels.buckets"
    }
     transform: [
       {calculate: "key", as: "order_label"}
       {calculate: "orders.orders_id.buckets.key", as: "order_id"}
       {calculate: "orders.orders_id.buckets.orders_price.value", as: "order_price"}
     ]
  },
"mark": "bar",
"encoding": {
"row": {
  "field": "order_label", "type": "ordinal"
},
"x": {
  "field": "order_price", "type": "quantitative",
  "scale": {"zero": false}
  "axis": {title:"Price"}
},
"y": {
  "field": "order_id", "type": "ordinal",
  "sort": {"field": "order_price","op": "sum", "order": "descending"},
  "scale": {"rangeStep": 40}
  "axis": {title:"Orders"}
}
  }
}

(Tim Sullivan) #3

Unfortunately, it's the nested data type that is causing problems.

The aggregation you wrote is solid, and gets the data summarized in a nice, nested, format. If we look at an order label bucket in Vega debug, it'll look something like:

> VEGA_DEBUG.view.data('source_0')[0]
doc_count: 288
key: "Canada"
orders:
  doc_count: 2880
  orders_id: buckets: [{...}, {...}, {...}, ...]
  doc_count_error_upper_bound: 0
  sum_other_doc_count: 2613

Easy to see what's happening there. There's a label of Canada and it has a bunch of order_id objects attributed - data buckets for each order_id. We can check out one of those order_id objects:

doc_count: 10
key: 636
orders_price:
  value: 52881

So there was an order with ID 636 and the price for that order was 52881

When the data is broken down in this nested way, it's easy for a human to understand. Unfortunately, with VegaLite (note your $schema), data really needs to come in a flattened format. In the Kibana integration with Vega, and querying Elasticsearch to get data for VegaLite, we usually want to use the composite aggregation - it makes the result data all nice and flat. You'd see this if the body looks like:

{
  "size": 0,
  "aggs": {
    "table": {
      "composite": {
        "sources": [
          {
            "time": {
              "date_histogram": {
                "field": "timestamp",
                "interval": "month"
              }
            }
          },
          {
            "label": {
              "terms": {
                "field": "label"
              }
            }
          }
        ]
      }
    }
  }
}

Composite aggregation is pretty much essential for VegaLite because it expects the data coming in to be flat, where each property of the array of data is a leaf value of data.

Note that I didn't include any of the fields under the nested type in that example. That's where the problem is: VegaLite syntax doesn't support dotted notation - accessing the nested fields using dots in VegaLite doesn't work:

 /* this won't work */
  encoding: {
    row: {
      field: key
      type: nominal
    }
    x: {
      field: orders.orders_id.buckets.orders_price.value /* `buckets` is an array - VegaLite doesn't dive into every object of the array to bring back `orders_price.value` */
      type: quantitative
      axis: {title: "Price"}
    }
    y: {
      field: orders.orders_id.buckets.key /* same problem with`buckets` is an array */
      type: quantitative
      axis: {title: "Orders"}
    }
  }

Another problem is that the composite aggregation only supports the source aggregations to be bucket aggregations themselves. Nested aggregations require having their own child aggregations, so they don't fit the model of composite.

{
  "size": 0,
  "aggs": {
    "table": {
      "composite": {
        "sources": [
          {
            "time": {
              "date_histogram": {
                "field": "timestamp",
                "interval": "month"
              }
            }
          },
          {
            "label": {
              "terms": {
                "field": "label"
              }
            }
          },
          {
            "orders": {
              "nested": {
                "path": "orders"
              },
              "aggs": {
                "order_ids": {
                  "terms": {
                    "field": "order.id",
                    "size": 1000
                  }
                }
              }
            }
          }
        ]
      }
    }
  }
}

# result:
{
  "error": {
    "root_cause": [
      {
        "type": "parsing_exception",
        "reason": "invalid source type: nested",
        "line": 24,
        "col": 25
      }
    ],
    "type": "parsing_exception",
    "reason": "[composite] failed to parse field [sources]",
    "line": 24,
    "col": 25,
    "caused_by": {
      "type": "parsing_exception",
      "reason": "invalid source type: nested",
      "line": 24,
      "col": 25
    }
  },
  "status": 400
}

I talked with Yuri Astrakhan on this (I don't think he is on Discuss) and he mentioned it might be possible to use your nested data with Vega as opposed to VegaLite, but that it would be harder. I had to ask him because to be honest - this was my first exploration with Vega! I thank you for the opportunity.

I'll have Yuri follow up to see if he has further ideas, and I'll subscribe to this topic to learn more about it.

Cheers,
-Tim


(Vanna) #4

Hey Tim,

Thank you for the reply and for the time to analyse it. I was trying some different queries and was a bit frustrated already.

Are there plans to have nested working with Vega-lite in the future? I believe it would be worth, both together can be really powerful.

And same question to nested types vs. composite query, are there plans them to work together in the future or should I just get ride of nested types?

Thanks again, cheers
Vanna


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.