Consider comma separated values in a field as separate values while aggregating

Hi,

i want to do aggregation on a field which has values like

doc1_field: "A"
doc2_field: "A, B"
doc3_field: "A, B, C"

What mappings / settings I can use so that when I aggregate on this field I should get results like:

key: A
count: 3

key: "B"
count: 2

key: "C"
count: 3

I have tried adding a comma separated analyzer on the above field but it's not giving me the desired results, I still get results like

key: "A"
count: 1

key: "A, B, C"
count: 1

which is not what I want..

Could you please help in how can I aggregate on this field??

This is more an Elasticsearch question but I think you need to ingest that fiel with actual separated values or maybe use an ingest pipeline to split it. The actual root cause is that tweaking the analizer will help you for searching but for aggregating you need to use a keyword field that contains one or more values.

Check the following code:

# Create an ingest pipeline to split and trim the incoming field
PUT _ingest/pipeline/discuss-345804
{
  "version": 1,
  "processors": [
    {
      "split": {
        "field": "field",
        "target_field": "processed_field",
        "separator": ","
      }
    },
    {
      "trim": {
        "field": "processed_field"
      }
    }
  ]
}

# Create an index with a text field and a default pipeline
PUT discuss-345804
{
  "mappings": {
    "properties": {
      "field": {"type": "text"},
      "processed_field": {"type":  "keyword"}
    }
  },
  "settings": {
    "default_pipeline": "discuss-345804"
  }
}

# Add some data
PUT discuss-345804/_bulk
{"index": {}}
{ "field": "A"}
{"index": {}}
{ "field": "A, B"}
{"index": {}}
{ "field": "A, B, C"}
{"index": {}}
{ "field": "B, C"}
{"index": {}}
{ "field": "B, C, A"}

Now you can search, aggregating by the keyword field:

# Aggregate by the keyword field
GET discuss-345804/_search
{
  "size": 0, 
  "query": { "match_all": {}},
  "aggs": {
    "counts": {
      "terms": {
        "field": "processed_field"
      }
    }
  }
}

# Result
{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 10,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "counts": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "A",
          "doc_count": 8
        },
        {
          "key": "B",
          "doc_count": 8
        },
        {
          "key": "C",
          "doc_count": 6
        }
      ]
    }
  }
}
1 Like

Thanks a lot @jsanz . This helps :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.