Shingles and terms aggregation not working as expected

zp230 · July 10, 2020, 1:33pm

Hi,

I am having issues with getting even a simple terms aggregation on a shingled field. For context, I want to be able to get a list of words/phrases found in the sentence/display_text field (using shingles), and their frequency of occurrence.

My index mapping is as follows:

{
    "settings": {
        "max_shingle_diff": 4,
        "analysis": {
            "analyzer": {
                "custom": {
                    "tokenizer": "standard",
                    "filter": ["lowercase", "shingle_filter"]
                }
            },
            "filter": {
                "shingle_filter": {
                    "type": "shingle",
                    "min_shingle_size": 2,
                    "max_shingle_size": 5,
                    "output_unigrams": "true"
                }
            }
        }
    },
    "mappings": {
        "properties": {
             ...
            "sentences": {
                "type": "nested",
                "include_in_root": "true",
                "properties": {
                    "display_text": {
                        "type": "text",
                        "fields": {
                            "shingles": {
                                "type": "text",
                                "analyzer": "custom"
                            }
                        }
                        },
                         ...
}

And here is the query I run:

GET /books/_search
{
    "size": 0,
    "query": {
        "terms": {
            "title": ["Book1"]
        }
    },
    "aggs": {

        "sentences_count": {
            "nested": {
                "path": "sentences"
            },
            "aggs": {
                "phrases": {
                    "terms": {
                        "field": "sentences.display_text.shingles",
                        "size": 100
                    }
                }
            }
        }
    }
}

However, I am getting an error that seems to still view the sentences.display_text.shingles as text, not shingle tokens?

"type" : "illegal_argument_exception",
"reason" : "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [sentences.display_text.shingles] in order to load field data by uninverting the inverted index. Note that this can use significant memory."

Does anyone know what I am doing wrong? From previous discussions on here it seems that this would be the correct query structure and index mapping.

Mark_Harwood · July 14, 2020, 10:01am

See https://www.elastic.co/guide/en/elasticsearch/reference/current/fielddata.html

system · August 11, 2020, 10:15am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
2.x shingle aggregation feature is lost in 5.x? Elasticsearch	1	735	January 5, 2017
Fuzzy searching on shingles filter getting problem Elasticsearch	1	634	November 6, 2018
More Like This and shingles / phrases Elasticsearch	2	656	October 15, 2017
Fuzzy searching on shingles filter getting problem for search Elasticsearch	1	408	November 9, 2018
Performance issues with top_hits aggregation using shingle filter Elasticsearch	1	1059	July 6, 2017

Shingles and terms aggregation not working as expected

Related topics