I am using Elasticsearch 6.2.4 and I am trying to index data in elasticsearch.
Here is the template that I am using:
{
"order": 0,
"template": "logs-*",
"settings": {
"index": {
"analysis": {
"analyzer": {
"ngram-msg-analyzer": {
"filter": [
"lowercase",
"standard"
],
"min_gram": "3",
"type": "custom",
"max_gram": "3",
"tokenizer": "ngram",
"min": "0",
"max": "2147483647"
}
}
},
"number_of_shards" : "3",
"number_of_replicas" : "1"
}
},
"mappings": {
"_doc": {
"dynamic_templates": [
{
"ts": {
"mapping": {
"format": "epoch_millis",
"type": "date"
},
"match_mapping_type": "string",
"match": "*_ts"
}
},
{
"strings_notanalyzed": {
"unmatch": "*_analyzed",
"mapping": {
"index": true,
"type": "keyword"
},
"match_mapping_type": "string"
}
}
],
"properties": {
"server_ts": {
"format": "strict_date_optional_time||epoch_millis",
"type": "date"
},
"log_message": {
"analyzer": "ngram-msg-analyzer",
"index": true,
"type": "text",
"fields": {
"std": {
"analyzer": "standard",
"type": "text"
},
"raw": {
"ignore_above": 2147483647,
"type": "keyword"
}
}
},
"message": {
"index": true,
"type": "text"
}
}
}
},
"aliases": {}
}
I have set the ignore_above option to max value to avoid dropping of messages but still, I am getting following error in Elasticsearch indexing logs for terms longer than 32766.
failed to execute bulk item (index) BulkShardRequest [[logs-2018-10-25][1]] containing [1406] requests java.lang.IllegalArgumentException: Document contains at least one immense term in field="log_message.raw" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[82, 101, 113, 117, 101, 115, 116, 32, 112, 114, 111, 99, 101, 115, 115, 105, 110, 103, 32, 101, 120, 99, 101, 112, 116, 105, 111, 110, 58, 32]...', original message: bytes can be at most 32766 in length; got 86139
Why is it happening even after setting the ignore_above to such a high value?