ElasticSearch 6.2.4 java.lang.IllegalArgumentException: Document contains at least one immense term


(Prateek Gupta) #1

I am using Elasticsearch 6.2.4 and I am trying to index data in elasticsearch.

Here is the template that I am using:

{
   "order": 0,
   "template": "logs-*",
   "settings": {
      "index": {
         "analysis": {
            "analyzer": {
               "ngram-msg-analyzer": {
                  "filter": [
                     "lowercase",
                     "standard"
                  ],
                  "min_gram": "3",
                  "type": "custom",
                  "max_gram": "3",
                  "tokenizer": "ngram",
                  "min": "0",
                  "max": "2147483647"
               }
            }
         },
         "number_of_shards" : "3",
         "number_of_replicas" : "1"
      }
   },
   "mappings": {
      "_doc": {
         "dynamic_templates": [
            {
               "ts": {
                  "mapping": {
                     "format": "epoch_millis",
                     "type": "date"
                  },
                  "match_mapping_type": "string",
                  "match": "*_ts"
               }
            },
            {
               "strings_notanalyzed": {
                  "unmatch": "*_analyzed",
                  "mapping": {
                     "index": true,
                     "type": "keyword"
                  },
                  "match_mapping_type": "string"
               }
            }
         ],
         "properties": {
            "server_ts": {
               "format": "strict_date_optional_time||epoch_millis",
               "type": "date"
            },
            "log_message": {
               "analyzer": "ngram-msg-analyzer",
               "index": true,
               "type": "text",
               "fields": {
                  "std": {
                     "analyzer": "standard",
                     "type": "text"
                  },
                  "raw": {
                     "ignore_above": 2147483647,
                     "type": "keyword"
                  }
               }
            },
            "message": {
               "index": true,
               "type": "text"
            }
         }
      }
   },
   "aliases": {}
}

I have set the ignore_above option to max value to avoid dropping of messages but still, I am getting following error in Elasticsearch indexing logs for terms longer than 32766.

failed to execute bulk item (index) BulkShardRequest [[logs-2018-10-25][1]] containing [1406] requests java.lang.IllegalArgumentException: Document contains at least one immense term in field="log_message.raw" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[82, 101, 113, 117, 101, 115, 116, 32, 112, 114, 111, 99, 101, 115, 115, 105, 110, 103, 32, 101, 120, 99, 101, 112, 116, 105, 111, 110, 58, 32]...', original message: bytes can be at most 32766 in length; got 86139

Why is it happening even after setting the ignore_above to such a high value?


Using the Truncate filter on keywords
(Christian Dahlqvist) #2

Are you sure you want to store the entire log_message field as a term in a keyword field? How are you going to use this? This is likely to have extremely high cardinality, which can result in very high heap usage.


(Prateek Gupta) #3

We are using terms aggregation over this field log_message to get message trends. And because of length, some long messages are missing out if we put length restriction to 32766. So the expectation is to get all the messages in trends while doing aggregation.

The other thing is if the keyword field cannot store data with length higher than 32766, what is the purpose of ignore_above having such high value by default? It gives a false expectations to user.


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.