I am building a Data Table in which I could get two required columns but I need to add one additional column which is the difference between these two. Is that somehow possible?
I could get the desired results using a Bucket Script Aggregation, but how can I use this in the Data Table?
Please consider the following mock-up case for a better explanation. The case is for performance metrics recorded for different machine learning models per customer. We need to show a summary table of the average performance of each model on each measured performance metric on a customer basis.
- The index mapping template:
PUT _template/metrics-template
{
"index_patterns": [
"metrics-*"
],
"settings": {
"analysis": {
"analyzer": {
"path-analyzer": {
"tokenizer": "path_hierarchy"
}
}
}
},
"mappings": {
"dynamic": false,
"properties": {
"path": {
"type": "text",
"analyzer": "path-analyzer",
"search_analyzer": "keyword",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"time": {
"type": "date"
},
"labels": {
"type": "object",
"dynamic": true
},
"value": {
"type": "double"
}
},
"dynamic_templates": [
{
"labels_as_keywords": {
"path_match": "labels.*",
"mapping": {
"type": "keyword"
}
}
}
]
}
}
- The data:
POST _bulk
{"index": {"_index": "metrics-01"}}
{"path": "models/performance/mse", "time": "2020-04-20T08:00:00.00000Z", "value": 1, "labels": {"customer_id": "1", "model_name": "rf"}}
{"index": {"_index": "metrics-01"}}
{"path": "models/performance/r2", "time": "2020-04-20T08:00:00.00000Z", "value": 0.5, "labels": {"customer_id": "1", "model_name": "rf"}}
{"index": {"_index": "metrics-01"}}
{"path": "models/performance/mse", "time": "2020-04-20T08:00:00.00000Z", "value": 2, "labels": {"customer_id": "1", "model_name": "nn"}}
{"index": {"_index": "metrics-01"}}
{"path": "models/performance/r2", "time": "2020-04-20T08:00:00.00000Z", "value": 0.2, "labels": {"customer_id": "1", "model_name": "nn"}}
{"index": {"_index": "metrics-01"}}
{"path": "models/performance/mse", "time": "2020-04-21T08:00:00.00000Z", "value": 1.2, "labels": {"customer_id": "1", "model_name": "rf"}}
{"index": {"_index": "metrics-01"}}
{"path": "models/performance/r2", "time": "2020-04-21T08:00:00.00000Z", "value": 0.5, "labels": {"customer_id": "1", "model_name": "rf"}}
{"index": {"_index": "metrics-01"}}
{"path": "models/performance/mse", "time": "2020-04-21T08:00:00.00000Z", "value": 2.4, "labels": {"customer_id": "1", "model_name": "nn"}}
{"index": {"_index": "metrics-01"}}
{"path": "models/performance/r2", "time": "2020-04-21T08:00:00.00000Z", "value": 0.2, "labels": {"customer_id": "1", "model_name": "nn"}}
{"index": {"_index": "metrics-01"}}
{"path": "models/performance/mse", "time": "2020-04-20T08:00:00.00000Z", "value": 10, "labels": {"customer_id": "2", "model_name": "rf"}}
{"index": {"_index": "metrics-01"}}
{"path": "models/performance/r2", "time": "2020-04-20T08:00:00.00000Z", "value": 0.8, "labels": {"customer_id": "2", "model_name": "rf"}}
{"index": {"_index": "metrics-01"}}
{"path": "models/performance/mse", "time": "2020-04-20T08:00:00.00000Z", "value": 20, "labels": {"customer_id": "2", "model_name": "nn"}}
{"index": {"_index": "metrics-01"}}
{"path": "models/performance/r2", "time": "2020-04-20T08:00:00.00000Z", "value": -1, "labels": {"customer_id": "2", "model_name": "nn"}}
{"index": {"_index": "metrics-01"}}
{"path": "models/performance/mse", "time": "2020-04-21T08:00:00.00000Z", "value": 12, "labels": {"customer_id": "2", "model_name": "rf"}}
{"index": {"_index": "metrics-01"}}
{"path": "models/performance/r2", "time": "2020-04-21T08:00:00.00000Z", "value": 0.3, "labels": {"customer_id": "2", "model_name": "rf"}}
{"index": {"_index": "metrics-01"}}
{"path": "models/performance/mse", "time": "2020-04-21T08:00:00.00000Z", "value": 24, "labels": {"customer_id": "2", "model_name": "nn"}}
{"index": {"_index": "metrics-01"}}
{"path": "models/performance/r2", "time": "2020-04-21T08:00:00.00000Z", "value": -1, "labels": {"customer_id": "2", "model_name": "nn"}}
- The Data Table (what I could achieve)
- What is missing is an additional column showing the difference (division or subtraction) between the Avg MSE (Yesterday) column the Avg MSE column.
- I could get this information using a Bucket Script Aggregation ran using the Dev Tools, is there anyway to get these results (run a similar query) into the Data Table?
GET metrics-01/_search
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"customers": {
"terms": {
"field": "labels.customer_id",
"order": {
"_key": "asc"
},
"size": 500
},
"aggs": {
"models": {
"terms": {
"field": "labels.model_name",
"order": {
"_key": "asc"
},
"size": 10
},
"aggs": {
"avg_mse_all": {
"filter": {
"query_string": {
"analyze_wildcard": true,
"query": """ path.raw: "models/performance/mse" """
}
},
"aggs": {
"avg_mse": {
"avg": {
"field": "value"
}
}
}
},
"avg_mse_yesterday": {
"filter": {
"query_string": {
"analyze_wildcard": true,
"query": """ path.raw: "models/performance/mse" AND time: [now-1d/d TO now/d] """
}
},
"aggs": {
"avg_mse": {
"avg": {
"field": "value"
}
}
}
},
"avg_mse_difference": {
"bucket_script": {
"buckets_path": {
"avg_mse_all_value": "avg_mse_all>avg_mse",
"avg_mse_yesterday_value": "avg_mse_yesterday>avg_mse"
},
"script": "params.avg_mse_yesterday_value - params.avg_mse_all_value"
}
}
}
}
}
}
}
}
I know that some plugins might provide a solution, but I never used any and I am not sure which one is the best option and what are the disadvantages of using kibana plugins.
Any help or direction is appreciated.
Thank you!