Terms aggregation ignoring analyzers?


#1

Hello guys,

I'm trying to run a simple terms aggregation:

GET test3/_search
{
    "aggs" : {
        "agg_name" : {
            "terms" : { 
              "field" : "words"
            }
        }
    },
    "size" : 0
}

My goal is to get a simple doc_count for every word (token) within the field words. But I keep getting the raw value of the whole words field.

For example:
"words" : "This is a sentence"
I'm expecting to get separate tokens like ["this", "is", "a", "sentence"] and count the occurrences of each token. What I get is ["this is a sentence"] for every words field with the doc_count resulting 1.

I have tried using different analysers and tokenisers, but whichever combination I use, the result is the same, so I'm really confused at the moment, as it seems that tokenisers don't have any effects on the aggregation.

This is my latest (current) index mapping configuration:

{
    "order": 0,
    "index_patterns": [
        "words-data-*"
    ],
    "settings": {
        "index": {
            "max_result_window": "200000",
            "refresh_interval": "-1",
            "analysis": {
                "analyzer": {
                    "my_analyzer": {
                        "filter": [
                            "lowercase",
                            "trim",
                            "reverse"
                        ],
                        "type": "custom",
                        "tokenizer": "standard"
                    }
                }
            },
            "number_of_shards": "1",
            "number_of_replicas": "0"
        }
    },
    "mappings": {
        "keywords": {
            "_all": {
                "enabled": false
            },
            "properties": {
                "words": {
                    "ignore_above": 256,
                    "store": true,
                    "eager_global_ordinals": true,
                    "type": "keyword",
                    "fields": {
                        "reverse": {
                            "search_analyzer": "my_analyzer",
                            "analyzer": "my_analyzer",
                            "type": "text"
                        }
                    }
                }
            }
        }
    },
    "aliases": {}
}

I would be expecting lots of different tokens, but it looks like there are none. I want the results of a standard tokeniser for the words field when running aggregations.

By the way, any other suggestions for the mapping?


(Zachary Tong) #2

This is your issue:

You've configured the "words" field as a keyword field, which means the text "this is a sentence" will be indexed as single token this is a sentence. You'll need to change the type to text so that you can assign it an analyzer. Then you'll see tokens like you expect in the Terms agg.


#3

Of all the things I've went through, this crossed my mind but didn't even bother trying it. Works like a charm now, thank you!


(Zachary Tong) #4

Happy to help! :slight_smile:


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.