Terms aggregation ignoring analyzers?

randomuser · May 3, 2018, 1:12pm

Hello guys,

I'm trying to run a simple terms aggregation:

GET test3/_search
{
    "aggs" : {
        "agg_name" : {
            "terms" : { 
              "field" : "words"
            }
        }
    },
    "size" : 0
}

My goal is to get a simple doc_count for every word (token) within the field words. But I keep getting the raw value of the whole words field.

For example:
"words" : "This is a sentence"
I'm expecting to get separate tokens like ["this", "is", "a", "sentence"] and count the occurrences of each token. What I get is ["this is a sentence"] for every words field with the doc_count resulting 1.

I have tried using different analysers and tokenisers, but whichever combination I use, the result is the same, so I'm really confused at the moment, as it seems that tokenisers don't have any effects on the aggregation.

This is my latest (current) index mapping configuration:

{
    "order": 0,
    "index_patterns": [
        "words-data-*"
    ],
    "settings": {
        "index": {
            "max_result_window": "200000",
            "refresh_interval": "-1",
            "analysis": {
                "analyzer": {
                    "my_analyzer": {
                        "filter": [
                            "lowercase",
                            "trim",
                            "reverse"
                        ],
                        "type": "custom",
                        "tokenizer": "standard"
                    }
                }
            },
            "number_of_shards": "1",
            "number_of_replicas": "0"
        }
    },
    "mappings": {
        "keywords": {
            "_all": {
                "enabled": false
            },
            "properties": {
                "words": {
                    "ignore_above": 256,
                    "store": true,
                    "eager_global_ordinals": true,
                    "type": "keyword",
                    "fields": {
                        "reverse": {
                            "search_analyzer": "my_analyzer",
                            "analyzer": "my_analyzer",
                            "type": "text"
                        }
                    }
                }
            }
        }
    },
    "aliases": {}
}

I would be expecting lots of different tokens, but it looks like there are none. I want the results of a standard tokeniser for the words field when running aggregations.

By the way, any other suggestions for the mapping?

polyfractal · May 3, 2018, 3:56pm

This is your issue:

You've configured the "words" field as a keyword field, which means the text "this is a sentence" will be indexed as single token this is a sentence. You'll need to change the type to text so that you can assign it an analyzer. Then you'll see tokens like you expect in the Terms agg.

randomuser · May 4, 2018, 12:01pm

Of all the things I've went through, this crossed my mind but didn't even bother trying it. Works like a charm now, thank you!

polyfractal · May 4, 2018, 1:14pm

Happy to help!

system · June 1, 2018, 1:14pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Issues with custom analyzer when running aggregate queries Elasticsearch	3	1016	September 18, 2017
Terms aggregation is breaking field into tokens Elasticsearch	2	695	July 5, 2017
Terms aggregation on multiple indexes, ignore text fields Elasticsearch	2	224	October 29, 2021
Significant terms aggregation with non tokenized text Elasticsearch	2	471	July 6, 2017
Stopwords in term aggregation Elasticsearch	7	1138	July 5, 2017

Terms aggregation ignoring analyzers?

Related topics