Hi,
I am having issues with getting even a simple terms aggregation on a shingled field. For context, I want to be able to get a list of words/phrases found in the sentence/display_text field (using shingles), and their frequency of occurrence.
My index mapping is as follows:
{
"settings": {
"max_shingle_diff": 4,
"analysis": {
"analyzer": {
"custom": {
"tokenizer": "standard",
"filter": ["lowercase", "shingle_filter"]
}
},
"filter": {
"shingle_filter": {
"type": "shingle",
"min_shingle_size": 2,
"max_shingle_size": 5,
"output_unigrams": "true"
}
}
}
},
"mappings": {
"properties": {
...
"sentences": {
"type": "nested",
"include_in_root": "true",
"properties": {
"display_text": {
"type": "text",
"fields": {
"shingles": {
"type": "text",
"analyzer": "custom"
}
}
},
...
}
And here is the query I run:
GET /books/_search
{
"size": 0,
"query": {
"terms": {
"title": ["Book1"]
}
},
"aggs": {
"sentences_count": {
"nested": {
"path": "sentences"
},
"aggs": {
"phrases": {
"terms": {
"field": "sentences.display_text.shingles",
"size": 100
}
}
}
}
}
}
However, I am getting an error that seems to still view the sentences.display_text.shingles as text, not shingle tokens?
"type" : "illegal_argument_exception",
"reason" : "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [sentences.display_text.shingles] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
Does anyone know what I am doing wrong? From previous discussions on here it seems that this would be the correct query structure and index mapping.