Hi,
I'm new to kibana and trying to create a line visualization from arrays given in a single document
Let's say some document has the following arrays:
This is a very specific use case. Typically, you would like to store each of those pairs in separate documents. But for this special use case, we need to dive into Vega visualisations:
I've created a document with only those fields in the description in the index test_array. Then, I've created a Vega visualisation with the following specification:
{
/*
Welcome to Vega visualizations. Here you can design your own dataviz from scratch using a declarative language called Vega, or its simpler form Vega-Lite. In Vega, you have the full control of what data is loaded, even from multiple sources, how that data is transformed, and what visual elements are used to show it. Use help icon to view Vega examples, tutorials, and other docs. Use the wrench icon to reformat this text, or to remove comments.
This example graph shows the document count in all indexes in the current time range. You might need to adjust the time filter in the upper right corner.
*/
$schema: https://vega.github.io/schema/vega-lite/v4.json
title: XY representation from arrays in documents
// Define the data source
data: {
url: {
/*
An object instead of a string for the "url" param is treated as an Elasticsearch query. Anything inside this object is not part of the Vega language, but only understood by Kibana and Elasticsearch server. This query counts the number of documents per time interval, assuming you have a @timestamp field in your data.
Kibana has a special handling for the fields surrounded by "%". They are processed before the the query is sent to Elasticsearch. This way the query becomes context aware, and can use the time range and the dashboard filters.
*/
// Apply dashboard context filters when set
%context%: true
// Filter the time picker (upper right corner) with this field (disabled because my test data didn't have any time field).
// %timefield%: @timestamp
/*
See .search() documentation for : https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/api-reference.html#api-search
*/
// Which index to search
index: test_array
// Aggregate data by the time field into time buckets, counting the number of documents in each bucket.
body: {
"size": 0,
"aggs": {
"customXY": {
"scripted_metric": {
"init_script": "state.values = []",
"map_script": "for (int i = 0; i < doc.A.size(); i++) { HashMap hash = new HashMap(); hash.key = doc.B[i]; hash.value = doc.A[i]; state.values.add(hash)}",
"combine_script": "return state.values",
"reduce_script": "List agg = []; for (values in states) { for (a in values) { agg.add(a); } } return agg;"
}
}
}
}
}
/*
Elasticsearch will return results in this format:
"aggregations" : {
"customXY" : {
"value" : [
{
"value" : 8,
"key" : 1
},
{
"value" : 55,
"key" : 2
},
{
"value" : 57,
"key" : 3
},
{
"value" : 99,
"key" : 4
}
]
}
}
For our graph, we only need the list of bucket values. Use the format.property to discard everything else.
*/
format: {property: "aggregations.customXY.value"}
}
// "mark" is the graphics element used to show our data. Other mark values are: area, bar, circle, line, point, rect, rule, square, text, and tick. See https://vega.github.io/vega-lite/docs/mark.html
mark: line
width: container
height: container
// "encoding" tells the "mark" what data to use and in what way. See https://vega.github.io/vega-lite/docs/encoding.html
encoding: {
x: {
// The "key" value is the timestamp in milliseconds. Use it for X axis.
field: key
type: ordinal
axis: {title: false} // Customize X axis format
}
y: {
// The "doc_count" is the count per bucket. Use it for Y axis.
field: value
type: quantitative
axis: {title: "Document count"}
}
}
}
NOTE: Please, bear in mind this visualisation is suboptimal because it uses scripted_metric aggregation which means Elasticsearch must run those painless scripts through all the documents, shards and cluster nodes. It is highly recommended to split those arrays into multiple documents at ingest time.
You are right! I didn't notice that! The reason is that the scripted_metrics aggregation uses the doc_values representation of the data, so the values are sorted by ES for efficiency.
Since you are after 1 document, I would recommend to:
Delete/comment the line with %context%: true
Change the body to something that looks like:
body: {
"size": 1,
"query": {
// Change this for your query
"match_all": {}
}
}
This will return the document as-is (under hits.hits[0]._source. Then, use one of the Vega Transformation rules to reformat the document into [{value: A0, key: B0}, {value: A1, key: B1}, ...]
Amend the encoding section to match the output of your Vega transform function.
Here's an example:
{
/*
Welcome to Vega visualizations. Here you can design your own dataviz from scratch using a declarative language called Vega, or its simpler form Vega-Lite. In Vega, you have the full control of what data is loaded, even from multiple sources, how that data is transformed, and what visual elements are used to show it. Use help icon to view Vega examples, tutorials, and other docs. Use the wrench icon to reformat this text, or to remove comments.
This example graph shows the document count in all indexes in the current time range. You might need to adjust the time filter in the upper right corner.
*/
$schema: https://vega.github.io/schema/vega-lite/v4.json
title: XY representation from arrays in documents
// Define the data source
data: {
url: {
/*
An object instead of a string for the "url" param is treated as an Elasticsearch query. Anything inside this object is not part of the Vega language, but only understood by Kibana and Elasticsearch server. This query counts the number of documents per time interval, assuming you have a @timestamp field in your data.
Kibana has a special handling for the fields surrounded by "%". They are processed before the the query is sent to Elasticsearch. This way the query becomes context aware, and can use the time range and the dashboard filters.
*/
// Apply dashboard context filters when set
// %context%: true
// Filter the time picker (upper right corner) with this field (disabled because my test data didn't have any time field).
// %timefield%: @timestamp
/*
See .search() documentation for : https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/api-reference.html#api-search
*/
// Which index to search
index: test_array
// Aggregate data by the time field into time buckets, counting the number of documents in each bucket.
body: {
"size": 1,
"query": {
// Change this for your query
"match_all": {}
}
}
}
/*
Elasticsearch will return results in this format:
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "test_array",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"A" : [
55,
99,
57,
8
],
"B" : [
1,
2,
3,
4
]
}
}
]
},
For our graph, we only need the list of bucket values. Use the format.property to discard everything else.
*/
format: {property: "hits.hits"}
}
// https://vega.github.io/vega-lite/docs/transform.html
"transform": [
// https://vega.github.io/vega-lite/docs/flatten.html
{"flatten": ["_source.B", "_source.A"]} // Flatten Transform
],
// "mark" is the graphics element used to show our data. Other mark values are: area, bar, circle, line, point, rect, rule, square, text, and tick. See https://vega.github.io/vega-lite/docs/mark.html
mark: line
width: container
height: container
// "encoding" tells the "mark" what data to use and in what way. See https://vega.github.io/vega-lite/docs/encoding.html
encoding: {
x: {
// The "key" value is the timestamp in milliseconds. Use it for X axis.
field: "_source.B"
type: ordinal
axis: {title: false} // Customize X axis format
}
y: {
// The "doc_count" is the count per bucket. Use it for Y axis.
field: "_source.A"
type: quantitative
axis: {title: "Document count"}
}
}
}
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.