ElasticSearch 6.2.4 java.lang.IllegalArgumentException: Document contains at least one immense term

Prateek_Gupta · October 26, 2018, 7:17am

I am using Elasticsearch 6.2.4 and I am trying to index data in elasticsearch.

Here is the template that I am using:

{
   "order": 0,
   "template": "logs-*",
   "settings": {
      "index": {
         "analysis": {
            "analyzer": {
               "ngram-msg-analyzer": {
                  "filter": [
                     "lowercase",
                     "standard"
                  ],
                  "min_gram": "3",
                  "type": "custom",
                  "max_gram": "3",
                  "tokenizer": "ngram",
                  "min": "0",
                  "max": "2147483647"
               }
            }
         },
         "number_of_shards" : "3",
         "number_of_replicas" : "1"
      }
   },
   "mappings": {
      "_doc": {
         "dynamic_templates": [
            {
               "ts": {
                  "mapping": {
                     "format": "epoch_millis",
                     "type": "date"
                  },
                  "match_mapping_type": "string",
                  "match": "*_ts"
               }
            },
            {
               "strings_notanalyzed": {
                  "unmatch": "*_analyzed",
                  "mapping": {
                     "index": true,
                     "type": "keyword"
                  },
                  "match_mapping_type": "string"
               }
            }
         ],
         "properties": {
            "server_ts": {
               "format": "strict_date_optional_time||epoch_millis",
               "type": "date"
            },
            "log_message": {
               "analyzer": "ngram-msg-analyzer",
               "index": true,
               "type": "text",
               "fields": {
                  "std": {
                     "analyzer": "standard",
                     "type": "text"
                  },
                  "raw": {
                     "ignore_above": 2147483647,
                     "type": "keyword"
                  }
               }
            },
            "message": {
               "index": true,
               "type": "text"
            }
         }
      }
   },
   "aliases": {}
}

I have set the ignore_above option to max value to avoid dropping of messages but still, I am getting following error in Elasticsearch indexing logs for terms longer than 32766.

failed to execute bulk item (index) BulkShardRequest [[logs-2018-10-25][1]] containing [1406] requests java.lang.IllegalArgumentException: Document contains at least one immense term in field="log_message.raw" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[82, 101, 113, 117, 101, 115, 116, 32, 112, 114, 111, 99, 101, 115, 115, 105, 110, 103, 32, 101, 120, 99, 101, 112, 116, 105, 111, 110, 58, 32]...', original message: bytes can be at most 32766 in length; got 86139

Why is it happening even after setting the ignore_above to such a high value?

Christian_Dahlqvist · October 26, 2018, 7:51am

Are you sure you want to store the entire log_message field as a term in a keyword field? How are you going to use this? This is likely to have extremely high cardinality, which can result in very high heap usage.

Prateek_Gupta · October 26, 2018, 8:06am

We are using terms aggregation over this field log_message to get message trends. And because of length, some long messages are missing out if we put length restriction to 32766. So the expectation is to get all the messages in trends while doing aggregation.

The other thing is if the keyword field cannot store data with length higher than 32766, what is the purpose of ignore_above having such high value by default? It gives a false expectations to user.

system · November 23, 2018, 8:14am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
IllegalArgumentException: Document contains at least one immense term in field=“abc”.(whose UTF8 encoding is longer than the max length 32766) Elasticsearch	3	2418	September 11, 2017
Indexing Large Documents in ES Elasticsearch	10	875	April 26, 2020
Document contains at least one immense term error Elasticsearch language-clients	3	2135	June 2, 2022
Sanitize a text for indexing Elasticsearch	3	1084	July 6, 2017
Please correct the analyzer to not produce such terms Elasticsearch	2	2745	July 5, 2017

ElasticSearch 6.2.4 java.lang.IllegalArgumentException: Document contains at least one immense term

Related topics