Consider comma separated values in a field as separate values while aggregating

Gagan_Saluja · October 26, 2023, 11:51am

Hi,

i want to do aggregation on a field which has values like

doc1_field: "A"
doc2_field: "A, B"
doc3_field: "A, B, C"

What mappings / settings I can use so that when I aggregate on this field I should get results like:

key: A
count: 3

key: "B"
count: 2

key: "C"
count: 3

I have tried adding a comma separated analyzer on the above field but it's not giving me the desired results, I still get results like

key: "A"
count: 1

key: "A, B, C"
count: 1

which is not what I want..

Could you please help in how can I aggregate on this field??

jsanz · October 27, 2023, 2:19pm

This is more an Elasticsearch question but I think you need to ingest that fiel with actual separated values or maybe use an ingest pipeline to split it. The actual root cause is that tweaking the analizer will help you for searching but for aggregating you need to use a keyword field that contains one or more values.

Check the following code:

# Create an ingest pipeline to split and trim the incoming field
PUT _ingest/pipeline/discuss-345804
{
  "version": 1,
  "processors": [
    {
      "split": {
        "field": "field",
        "target_field": "processed_field",
        "separator": ","
      }
    },
    {
      "trim": {
        "field": "processed_field"
      }
    }
  ]
}

# Create an index with a text field and a default pipeline
PUT discuss-345804
{
  "mappings": {
    "properties": {
      "field": {"type": "text"},
      "processed_field": {"type":  "keyword"}
    }
  },
  "settings": {
    "default_pipeline": "discuss-345804"
  }
}

# Add some data
PUT discuss-345804/_bulk
{"index": {}}
{ "field": "A"}
{"index": {}}
{ "field": "A, B"}
{"index": {}}
{ "field": "A, B, C"}
{"index": {}}
{ "field": "B, C"}
{"index": {}}
{ "field": "B, C, A"}

Now you can search, aggregating by the keyword field:

# Aggregate by the keyword field
GET discuss-345804/_search
{
  "size": 0, 
  "query": { "match_all": {}},
  "aggs": {
    "counts": {
      "terms": {
        "field": "processed_field"
      }
    }
  }
}

# Result
{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 10,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "counts": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "A",
          "doc_count": 8
        },
        {
          "key": "B",
          "doc_count": 8
        },
        {
          "key": "C",
          "doc_count": 6
        }
      ]
    }
  }
}

Gagan_Saluja · November 1, 2023, 4:26am

Thanks a lot @jsanz . This helps

system · November 29, 2023, 4:27am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Aggregation on comma separated text field Kibana	2	2375	March 13, 2017
Elasticsearch Java api split a field Elasticsearch	2	747	July 5, 2017
Create Aggregate field with separated by comma Elasticsearch	4	743	December 3, 2021
Aggregation in Elasticsearch Elasticsearch	1	285	March 26, 2019
Parsing key value pairs from 2 seperate fields into several fields in ingest pipeline Elasticsearch	1	211	January 5, 2023

Consider comma separated values in a field as separate values while aggregating

Related topics