Count of not unique term's

There is my sample dataset:
image

I can get a list of not unique text's in the table:
image

source
{
  "aggs": {
    "2": {
      "terms": {
        "field": "text.keyword",
        "order": {
          "_count": "desc"
        },
        "size": 5,
        "min_doc_count": 2
      }
    }
  },
  "size": 0,
  "fields": [
    {
      "field": "@timestamp",
      "format": "date_time"
    }
  ],
  "script_fields": {},
  "stored_fields": [
    "*"
  ],
  "_source": {
    "excludes": []
  },
  "query": {
    "bool": {
      "must": [],
      "filter": [
        {
          "match_all": {}
        },
        {
          "range": {
            "@timestamp": {
              "gte": "2021-02-22T22:00:00.000Z",
              "lte": "2021-02-23T21:59:59.999Z",
              "format": "strict_date_optional_time"
            }
          }
        }
      ],
      "should": [],
      "must_not": []
    }
  }
}

How to get the count of not unique text's in the metric visualisation?

In my case it should be like this:
image

This requires something more than the standard Kibana visualizations. You basically have two options:

  1. Use Vega because it supports math
  2. Change the data that you store in Elasticsearch by using Data transforms and scripting

Thank you @wylie

  1. Vega does not contain visualization of type metric
  2. I'll try but transforms too cumbersome for such task :frowning:

Vega is absolutely capable of displaying a single text field. Use mark: text.

I need to display metric — not text.

Yes, I'm trying to tell you that those are the same thing in Vega

To get what I need, I can make such a request:

Request
GET /<my_index>/_search
{  
   "size":0,
   "aggs":{  
      "duplicates":{  
         "terms":{  
            "field":"text.keyword",
            "min_doc_count":2
         }
      },
      "duplicates_count":{  
         "stats_bucket":{  
            "buckets_path":"duplicates._count"
         }
      },
   }
}

Sample response:

Response
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "duplicates" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "bar",
          "doc_count" : 3
        },
        {
          "key" : "foo",
          "doc_count" : 2
        }
      ]
    },
    "duplicates_count" : {
      "count" : 2,
      "min" : 2.0,
      "max" : 3.0,
      "avg" : 2.5,
      "sum" : 5.0
    }
  }
}

But I can't figure out how to visualize duplicates_count metric
Can you give me an example?

I have written several tutorials explaining how to connect Elasticsearch queries to Vega output:

{
  $schema: https://vega.github.io/schema/vega-lite/v4.json
  data: {
    url: {
      %context%: true
      %timefield%: order_date
      index: kibana_sample_data_ecommerce
      body: {
        aggs: // add your stuff here
        size: 0
      }
    }
    format: {property: "aggregations.path.to.bucket" }
  }
  mark: text
  encoding: {
    text: {
      field: path.to.count
    }
  }

Here is an example of the transformations of what I think you are looking for if you wanted to do it all in Vega. Output is 2.

{
  "$schema": "https://vega.github.io/schema/vega/v5.json",
  "data": [
    {
      "name": "table",
      "values": [
        {"id": 1, "text": "foo"},
        {"id": 2, "text": "bar"},
        {"id": 3, "text": "abc"},
        {"id": 4, "text": "def"},
        {"id": 5, "text": "foo"},
        {"id": 6, "text": "ghi"},
        {"id": 7, "text": "bar"},
        {"id": 8, "text": "jkl"}
      ],
      "transform": [
        {
          "type": "aggregate",
          "groupby": ["text"],
          "fields": ["text"],
          "ops": ["valid"],
          "as": ["matched"]
        },
        {
          "type": "formula",
          "as": "matched-count",
          "expr": "datum.matched == 2 ? 1 : 0"
        },
        {
          "type": "aggregate",
          "fields": ["matched-count"],
          "ops": ["sum"],
          "as": ["sum_match_count"]
        }
      ]
    }
  ],
  "marks": [
    {
      "type": "text",
      "from": {"data": "table"},
      "encode": {
        "enter": {
          "fill": {"value": "#333"},
          "x": {"value": 10},
          "y": {"value": 10},
          "text": {"field": "sum_match_count"}
        }
      }
    }
  ]
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.