Document contains at least one immense term error

Hi,

I am trying to add some data to ES using javascript client, I am getting this error

 ResponseError: illegal_argument_exception: [illegal_argument_exception] Reason: Document contains at least one immense term in field="content.en.raw" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: '[10, 10, 60, 100, 105, 118, 32, 99, 108, 97, 115, 115, 61, 34, 87, 111, 114, 100, 83, 101, 99, 116, 105, 111, 110, 49, 34, 62, 10, 10]...', original message: bytes can be at most 32766 in length; got 41553
    at onBody (NodeApp/node_modules/@elastic/elasticsearch/lib/Transport.js:367:23)
    at IncomingMessage.onEnd (NodeApp/node_modules/@elastic/elasticsearch/lib/Transport.js:291:11)
    at IncomingMessage.emit (node:events:532:35)
    at endReadableNT (node:internal/streams/readable:1346:12)
    at processTicksAndRejections (node:internal/process/task_queues:83:21) {
  meta: {
    body: { error: [Object], status: 400 },
    statusCode: 400,
    headers: {
      'x-elastic-product': 'Elasticsearch',
      warning: '299 Elasticsearch-7.17.0-bee86328705acaa9a6daede7140defd4d9ec56bd "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security."',
      'content-type': 'application/json; charset=UTF-8',
      'content-length': '1179'
    },
    meta: {
      context: null,
      request: [Object],
      name: 'elasticsearch-js',
      connection: [Object],
      attempts: 0,
      aborted: false
    }
  }
} 

The mapping for the particular field is like this

"content":{
          "properties":{
            "fr":{
              "type":"text",
              "analyzer":"french",
              "term_vector":"with_positions_offsets",
              "fields":{
                "raw":{
                  "type":"keyword",
                  "index":true
                },
                "shingles":{
                  "type":"text",
                  "analyzer":"laws_shingle_analyzer",
                  "term_vector":"with_positions_offsets"
                },
                "exact":{
                  "type":"text",
                  "analyzer":"laws_icu_normalized",
                  "term_vector":"with_positions_offsets"
                },
                "folded":{
                  "type":"text",
                  "analyzer":"laws_icu_folded",
                  "term_vector":"with_positions_offsets"
                }
              }
            },
            "en":{
              "type":"text",
              "analyzer":"english",
              "term_vector":"with_positions_offsets",
              "fields":{
                "raw":{
                  "type":"keyword",
                  "index":true
                },
                "shingles":{
                  "type":"text",
                  "analyzer":"laws_shingle_analyzer",
                  "term_vector":"with_positions_offsets"
                },
                "exact":{
                  "type":"text",
                  "analyzer":"laws_icu_normalized",
                  "term_vector":"with_positions_offsets"
                },
                "folded":{
                  "type":"text",
                  "analyzer":"laws_icu_folded",
                  "term_vector":"with_positions_offsets"
                }
              }
            }
          }
        },

How do I fix this ? More information on the date, I am inserting html source code as a block of string which needs to be searchable.

The original mapping for this field in 2.14 was

"content": {
            "properties": {
              "fr": {
                "type": "string",
                "analyzer": "french",
                "term_vector": "with_positions_offsets",
                "fields": {
                  "shingles": {
                    "type": "string",
                    "analyzer": "laws_shingle_analyzer",
                    "term_vector": "with_positions_offsets"
                  },
                  "exact": {
                    "type": "string",
                    "analyzer": "laws_icu_normalized",
                    "term_vector": "with_positions_offsets"
                  },
                  "folded": {
                    "type": "string",
                    "analyzer": "laws_icu_folded",
                    "term_vector": "with_positions_offsets"
                  }
                }
              },
              "en": {
                "type": "string",
                "analyzer": "english",
                "term_vector": "with_positions_offsets",
                "fields": {
                  "shingles": {
                    "type": "string",
                    "analyzer": "laws_shingle_analyzer",
                    "term_vector": "with_positions_offsets"
                  },
                  "exact": {
                    "type": "string",
                    "analyzer": "laws_icu_normalized",
                    "term_vector": "with_positions_offsets"
                  },
                  "folded": {
                    "type": "string",
                    "analyzer": "laws_icu_folded",
                    "term_vector": "with_positions_offsets"
                  }
                }
              }
            }
          },

Adding to the original post: I have tried making the raw field as text and index:false and I still get the same error

I believe it has to do with the max size of the field.

This option is also useful for protecting against Lucene’s term byte-length limit of 32766 .

I understand that but in the version 2.14 ES the same huge data exists in the field

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.