I'm storing quality metrics in elastic. One of those metrics is code coverage (extracted from sonar). I want to calculate the difference in average code coverage for a project between weeks. So this week the average code coverage might be 80% and last week it was 90% for that same project. So the difference is -10%.
I've found that serial differencing aggregations are perfect for calculating this and I have a working query. (Shared query syntax below). However I need a UI to display these query results to users and I don't plan to build one right now. I've found that the only way to display serial difference query results in Kibana is with TSVB BUT that'll only work if I do it as a histogram. I want a table view like this:
Project name % Coverage %change from last week
Project 1 80% -10%
Project 2 70% 5%
A trend arrow in that % change column would be awesome. I couldn't get the trend arrows to work in TSVB either or at least they don't seem to do what I want.
So here's what I think I need as a solution although it'd be nice to hear there's an easier approach.
I need to calculate the % change in coverage from the previous week at the moment it's created. Doing so will allow me to use kibana's table visualization out of the box. If I create an ingest pipeline, I think I could then:
- Extract the @timestamp field out of the incoming metric.
- Subtract 7 days from it to figure out the date on the week prior and use it to build the time range in the serial difference query.
- Extract the project name out of the incoming metric so I'd know which project to query for in the serial difference query
- Run the serial difference aggregation inside the ingest pipeline.
- Extract the value percent_change.value field on the newest bucket if a value exists.
- Append a new field to the document called percent_change_from_last_week in the document to be written.
Is there an example ingest pipeline out there that closely matches what I'm trying to do?
Here is my query:
POST /sonarmetrics/_search
{
"size": 0,
"aggs": {
"2": {
"terms": {
"field": "key.keyword"
},
"aggs": {
"my_date_histo": {
"date_histogram": {
"field": "@timestamp",
"calendar_interval": "week"
},
"aggs": {
"avg_coverage": {
"avg": {
"field": "coverage"
}
},
"percent_change_from_last_week": {
"serial_diff": {
"buckets_path": "avg_coverage",
"lag" : 1
}
}
}
}
}
}
},
"query": {
"bool": {
"must": [],
"filter": [
{
"match_all": {}
},
{
"match_phrase": {
"name.keyword": "Project1" // Value needs to come from name field in metric
}
},
{
"range": {
"@timestamp": {
"gte": "2020-07-08T19:29:12.054Z", // Needs value from @timestamp minus 1 week
"lte": "2020-07-15T19:29:12.055Z", // Should be the value from @timestamp
"format": "strict_date_optional_time"
}
}
}
],
"should": [],
"must_not": []
}
}
}
Here is an example result:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 153,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"2" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Project1",
"doc_count" : 153,
"my_date_histo" : {
"buckets" : [
{
"key_as_string" : "2020-07-06T00:00:00.000Z",
"key" : 1593993600000,
"doc_count" : 98,
"avg_coverage" : {
"value" : 76.21224562975826
}
},
{
"key_as_string" : "2020-07-13T00:00:00.000Z",
"key" : 1594598400000,
"doc_count" : 55,
"avg_coverage" : {
"value" : 75.94545371315697
},
"Percent_Change" : {
"value" : -0.2667919166012922 // This is the value I want to extract into a field called percent_change_from_last_week
}
}
]
}
}
]
}
}
}