Terms aggregation for emails list

Hello all,

I have problem with terms aggregation; the problem is when am trying to make an aggregation out of multiple lists of emails, elasticsearch is splitting the response in buckets.key by the "@" sign in the email address. the request is:

GET test/_search
{
	"_source": "emails",
    "query": {
        "term": {
            "companyId": {
                "value": 31953
            }
        }
    },
	"aggs": {
    "distinct_emails": {
      "terms": {
        "field": "emails"
      }
    }
  }
}

the full response is:

  {
    "took": 6,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 18,
            "relation": "eq"
        },
        "max_score": 1,
        "hits": [
            {
                "_index": "test",
                "_type": "items",
                "_id": "8290c5f279dabb08a21ed11f3515a94e2408650d",
                "_score": 1,
                "_source": {
                    "emails": [
                        "info@companyname.com",
                        "info@companyname.com",
                        "info@companyname.com",
                        "info@companyname.com"
                    ]
                }
            },
            {
                "_index": "test",
                "_type": "items",
                "_id": "64e5974d5db9e1379fd52393521e256b71b364f9",
                "_score": 1,
                "_source": {
                    "emails": [
                        "info@companyname.com",
                        "info@companyname.com",
                        "info@companyname.com",
                        "info@companyname.com",
                        "info@companyotherdomain.com",
                        "info@companyname.com",
                        "info@companyname.com"
                    ]
                }
            },
            {
                "_index": "test",
                "_type": "items",
                "_id": "67754094f58ceca963ab0933c7ecb7f7cded6077",
                "_score": 1,
                "_source": {
                    "emails": [
                        "info@companyname.com",
                        "info@companyname.com",
                        "info@companyname.com",
                        "info@companyname.com"
                    ]
                }
            },
            {
                "_index": "test",
                "_type": "items",
                "_id": "f71617eb607b3e77f03794e41ca239322d53b709",
                "_score": 1,
                "_source": {
                    "emails": [
                        "info@companyname.com",
                        "info@companyname.com",
                        "info@companyname.com",
                        "info@companyname.com"
                    ]
                }
            }
        ]
    },
    "aggregations": {
        "distinct_emails": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "companyname.com",
                    "doc_count": 13
                },
                {
                    "key": "info",
                    "doc_count": 13
                },
                {
                    "key": "companyotherdomain.com",
                    "doc_count": 1
                }
            ]
        }
    }
}

the buckets part in the response is:

"buckets": [
                {
                    "key": "companyname.com",
                    "doc_count": 13
                },
                {
                    "key": "info",
                    "doc_count": 13
                },
                {
                    "key": "companyotherdomain.com",
                    "doc_count": 1
                }
            ]

I'm expecting the below:

"buckets": [
                {
                    "key": "info@companyname.com",
                    "doc_count": 13
                },
                {
                    "key": "info@companyotherdomain.com",
                    "doc_count": 1
                }
            ]

Hi,

maybe this helps here as well:

Hi @isamzwuairi,

Can you show your mapping, you may need to use multi-field to have a keyword type and your actual analyzer, so you can make your aggregations on the keyword mapped field. You can check the doc here:
https://www.elastic.co/guide/en/elasticsearch/reference/7.4/multi-fields.html#multi-fields

Thanks for your reply. Any suggestions on how to fix this behavior?

Hi @gabriel_tessier,

Thanks for your replay, the field mapping is:

{
    "test": {
        "mappings": {
            "emails": {
                "full_name": "emails",
                "mapping": {
                    "emails": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        },
                        "fielddata": true
                    }
                }
            }
        }
    }
}

You can make the aggregations on "emails.keyword".

Something like:

"aggs": { "distinct_emails": { "terms": { "field": "emails.keyword" } }

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.