I have a wonderful little document and index that I use to ingest document. Unfortunately, a few of the documents have long fields and the ingest process fails for those documents.
An example document that I have contains
{
"PostBody" : {
"type" : "text",
"search_analyzer": "simple",
"analyzer": "analyzer_startswith",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
},
"ending": {
"type": "text",
"search_analyzer": "simple",
"analyzer": "analyzer_endswith"
},
"cloud" : {
"type" : "text",
"analyzer" : "my_stop_analyzer",
"search_analyzer" : "my_stop_analyzer",
"fielddata" : true
}
}
}
}
and my index's settings are
{
"index" : {
"number_of_shards" : "5",
"number_of_replicas" : "1",
"analysis": {
"analyzer": {
"analyzer_startswith" : {
"tokenizer": "keyword",
"filter": "lowercase"
},
"analyzer_endswith" : {
"tokenizer": "keyword",
"filter" : [
"lowercase",
"reverse"
]
},
"my_stop_analyzer" : {
"type" : "stop",
"stopwords_path" : "/etc/elasticsearch/word_cloud_stopwords.txt",
"filter" : ["lowercase"]
}
}
}
}
}
Now, the error message I am receiving is
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Document contains at least one immense term in field=\"PostBody\" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[78, 117, 108, 108, 97, 109, 32, 118, 97, 114, 105, 117, 115, 46, 32, 78, 117, 108, 108, 97, 32, 102, 97, 99, 105, 108, 105, 115, 105, 46]...', original message: bytes can be at most 32766 in length; got 37887"}],"type":"illegal_argument_exception","reason":"Document contains at least one immense term in field=\"PostBody\" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[78, 117, 108, 108, 97, 109, 32, 118, 97, 114, 105, 117, 115, 46, 32, 78, 117, 108, 108, 97, 32, 102, 97, 99, 105, 108, 105, 115, 105, 46]...', original message: bytes can be at most 32766 in length; got 37887","caused_by":{"type":"max_bytes_length_exceeded_exception","reason":"bytes can be at most 32766 in length; got 37887"}},"status":400}
I have tried any number of ways around this and have not found one that preserves the functionality AND mitigates the max length exception. Do you know how I can alter the index mapping or analyzers to accommodate such large documents?