Visualization of array vs array from a single document

Hi,
I'm new to kibana and trying to create a line visualization from arrays given in a single document
Let's say some document has the following arrays:

A: [55, 99, 57, 8]
B: [1, 2, 3, 4]

I want B to be my X axis and A to be my Y axis.

First point (1,55) etc...
Is it possible?

Hi @Leonid_Pantaler,

This is a very specific use case. Typically, you would like to store each of those pairs in separate documents. But for this special use case, we need to dive into Vega visualisations:

I've created a document with only those fields in the description in the index test_array. Then, I've created a Vega visualisation with the following specification:

{
/*

Welcome to Vega visualizations.  Here you can design your own dataviz from scratch using a declarative language called Vega, or its simpler form Vega-Lite.  In Vega, you have the full control of what data is loaded, even from multiple sources, how that data is transformed, and what visual elements are used to show it.  Use help icon to view Vega examples, tutorials, and other docs.  Use the wrench icon to reformat this text, or to remove comments.

This example graph shows the document count in all indexes in the current time range.  You might need to adjust the time filter in the upper right corner.
*/

  $schema: https://vega.github.io/schema/vega-lite/v4.json
  title: XY representation from arrays in documents

  // Define the data source
  data: {
    url: {
/*
An object instead of a string for the "url" param is treated as an Elasticsearch query. Anything inside this object is not part of the Vega language, but only understood by Kibana and Elasticsearch server. This query counts the number of documents per time interval, assuming you have a @timestamp field in your data.

Kibana has a special handling for the fields surrounded by "%".  They are processed before the the query is sent to Elasticsearch. This way the query becomes context aware, and can use the time range and the dashboard filters.
*/

      // Apply dashboard context filters when set
      %context%: true
      // Filter the time picker (upper right corner) with this field (disabled because my test data didn't have any time field).
      // %timefield%: @timestamp 

/*
See .search() documentation for :  https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/api-reference.html#api-search
*/

      // Which index to search
      index: test_array
      // Aggregate data by the time field into time buckets, counting the number of documents in each bucket.
      body: {
        "size": 0,
        "aggs": {
          "customXY": {
            "scripted_metric": {
              "init_script": "state.values = []",
              "map_script": "for (int i = 0; i < doc.A.size(); i++) { HashMap hash = new HashMap(); hash.key = doc.B[i]; hash.value = doc.A[i]; state.values.add(hash)}",
              "combine_script": "return state.values",
              "reduce_script": "List agg = []; for (values in states) { for (a in values) { agg.add(a); } } return agg;"
            }
          }
        }
      }
    }
/*
Elasticsearch will return results in this format:

  "aggregations" : {
    "customXY" : {
      "value" : [
        {
          "value" : 8,
          "key" : 1
        },
        {
          "value" : 55,
          "key" : 2
        },
        {
          "value" : 57,
          "key" : 3
        },
        {
          "value" : 99,
          "key" : 4
        }
      ]
    }
  }

For our graph, we only need the list of bucket values.  Use the format.property to discard everything else.
*/
    format: {property: "aggregations.customXY.value"}
  }

  // "mark" is the graphics element used to show our data.  Other mark values are: area, bar, circle, line, point, rect, rule, square, text, and tick.  See https://vega.github.io/vega-lite/docs/mark.html
  mark: line
  
  width: container
  height: container

  // "encoding" tells the "mark" what data to use and in what way.  See https://vega.github.io/vega-lite/docs/encoding.html
  encoding: {
    x: {
      // The "key" value is the timestamp in milliseconds.  Use it for X axis.
      field: key
      type: ordinal
      axis: {title: false} // Customize X axis format
    }
    y: {
      // The "doc_count" is the count per bucket.  Use it for Y axis.
      field: value
      type: quantitative
      axis: {title: "Document count"}
    }
  }
}

NOTE: Please, bear in mind this visualisation is suboptimal because it uses scripted_metric aggregation which means Elasticsearch must run those painless scripts through all the documents, shards and cluster nodes. It is highly recommended to split those arrays into multiple documents at ingest time.

Hi,
Thank you so much for your reply.

I'm still struggling with 2 issues:

  1. The Y axis values aren't consistent with the X values. i.e there is no point (1,55), it's (1,8) instead.
  2. I'm trying to select one document only. I think it can be done using timestamp but I'm not quite sure how

Hi,

You are right! I didn't notice that! The reason is that the scripted_metrics aggregation uses the doc_values representation of the data, so the values are sorted by ES for efficiency.

Since you are after 1 document, I would recommend to:

  1. Delete/comment the line with %context%: true
  2. Change the body to something that looks like:
      body: {
        "size": 1,
        "query": {
          // Change this for your query
          "match_all": {}
        }
      }
  1. This will return the document as-is (under hits.hits[0]._source. Then, use one of the Vega Transformation rules to reformat the document into [{value: A0, key: B0}, {value: A1, key: B1}, ...]
  2. Amend the encoding section to match the output of your Vega transform function.

Here's an example:

{
/*

Welcome to Vega visualizations.  Here you can design your own dataviz from scratch using a declarative language called Vega, or its simpler form Vega-Lite.  In Vega, you have the full control of what data is loaded, even from multiple sources, how that data is transformed, and what visual elements are used to show it.  Use help icon to view Vega examples, tutorials, and other docs.  Use the wrench icon to reformat this text, or to remove comments.

This example graph shows the document count in all indexes in the current time range.  You might need to adjust the time filter in the upper right corner.
*/

  $schema: https://vega.github.io/schema/vega-lite/v4.json
  title: XY representation from arrays in documents

  // Define the data source
  data: {
    url: {
/*
An object instead of a string for the "url" param is treated as an Elasticsearch query. Anything inside this object is not part of the Vega language, but only understood by Kibana and Elasticsearch server. This query counts the number of documents per time interval, assuming you have a @timestamp field in your data.

Kibana has a special handling for the fields surrounded by "%".  They are processed before the the query is sent to Elasticsearch. This way the query becomes context aware, and can use the time range and the dashboard filters.
*/

      // Apply dashboard context filters when set
      // %context%: true
      // Filter the time picker (upper right corner) with this field (disabled because my test data didn't have any time field).
      // %timefield%: @timestamp 

/*
See .search() documentation for :  https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/api-reference.html#api-search
*/

      // Which index to search
      index: test_array
      // Aggregate data by the time field into time buckets, counting the number of documents in each bucket.
      body: {
        "size": 1,
        "query": {
          // Change this for your query
          "match_all": {}
        }
      }
    }
/*
Elasticsearch will return results in this format:

  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test_array",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "A" : [
            55,
            99,
            57,
            8
          ],
          "B" : [
            1,
            2,
            3,
            4
          ]
        }
      }
    ]
  },

For our graph, we only need the list of bucket values.  Use the format.property to discard everything else.
*/
    format: {property: "hits.hits"}
  }
  
  // https://vega.github.io/vega-lite/docs/transform.html
  "transform": [
    // https://vega.github.io/vega-lite/docs/flatten.html
    {"flatten": ["_source.B", "_source.A"]} // Flatten Transform
  ],

  // "mark" is the graphics element used to show our data.  Other mark values are: area, bar, circle, line, point, rect, rule, square, text, and tick.  See https://vega.github.io/vega-lite/docs/mark.html
  mark: line
  
  width: container
  height: container

  // "encoding" tells the "mark" what data to use and in what way.  See https://vega.github.io/vega-lite/docs/encoding.html
  encoding: {
    x: {
      // The "key" value is the timestamp in milliseconds.  Use it for X axis.
      field: "_source.B"
      type: ordinal
      axis: {title: false} // Customize X axis format
    }
    y: {
      // The "doc_count" is the count per bucket.  Use it for Y axis.
      field: "_source.A"
      type: quantitative
      axis: {title: "Document count"}
    }
  }
}

Thank you very much :slight_smile:

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.