Flattened field contains one immense field whose keyed encoding is longer than the allowed max length of 32766 bytes

Hi Team,

We are using Elasticsearch with dotnet client version as

PackageReference Include="Elastic.Clients.Elasticsearch" Version="8.17.1"

We have an index in elastic where the mapping looks like below

"mappings": {
      "dynamic": "false",
      "properties": {
        "FieldA": {
          "type": "keyword"
        },
        "FieldB": {
          "type": "boolean"
        },
        "FieldC": {
          "type": "text"
        },
        "FieldD": {
          "type": "float"
        },
        "FieldF": {
          "type": "flattened"
        },
        "FieldG": {
          "type": "text",
          "fields": {
            "pattern": {
              "type": "text",
              "analyzer": "pattern_analyzer"
            }
          },
          "analyzer": "standard"
        }
      }
    }

While indexing a document we are getting the below error

"exception": "Elastic.Transport.TransportException: Request failed to execute. Call: Status code 400 from: PUT /indexA-alias/_doc/abc123?version=7486156421021565048&version_type=external. ServerError: Type: document_parsing_exception Reason: \"[1:22532] failed to parse field [FieldF] of type [flattened] in document with id 'abc123'

FieldF is of type Flattened. And looks like there is a limitation which gives below error

CausedBy: \"Type: illegal_argument_exception Reason: \"Flattened field [FieldF] contains one immense field whose keyed encoding is longer than the allowed max length of 32766 bytes. Key length: 5, value length: 137627 for key starting with [a20pq]\"\"\n   at Elastic.Transport.DistributedTransport`1.HandleTransportException(BoundConfiguration boundConfiguration, Exception clientException, TransportResponse response)\n   at Elastic.Transport.DistributedTransport`1.FinalizeResponse[TResponse](Endpoint endpoint, BoundConfiguration boundConfiguration, PostData postData, RequestPipeline pipeline, DateTimeOffset startedOn, Int32 attemptedNodes, Auditor auditor, List`1 seenExceptions, TResponse response)\n   at Elastic.Transport.DistributedTransport`1.RequestCoreAsync[TResponse](Boolean isAsync, EndpointPath path, PostData data, Action`1 configureActivity, IRequestConfiguration localConfiguration, CancellationToken cancellationToken)

What could be the possible solution here? We are using ES for our product's keyword searches where we have both exact and partial matches.

Thanks,
Moni

Use the ignore_above mapping parameter to ignore leaf fields that are larger than the specified value.

Something like this:

        "FieldF": {
          "type": "flattened",
          "ignore_above": 8191
        }

The value for ignore_above is the character count , but Lucene counts bytes. If you use UTF-8 text with many non-ASCII characters, you may want to set the limit to 32766 / 4 = 8191 since UTF-8 characters may occupy at most 4 bytes.

Thanks @leandrojmp
One of our solution involves a lot of data, and looks like some of that data can be big. From a full text search standpoint we can probably not index data that is over a specific size.
There won't be similar length restrictions for a "text" type field right?

"FieldG": {
          "type": "text"
}

Is the flattened field just a large string of text from the data in FieldF?
For the flattened data type in itself, this is what the documentation says:

By default, each subfield in an object is mapped and indexed separately. If the names or types of the subfields are not known in advance, then they are mapped dynamically.

The flattened type provides an alternative approach, where the entire object is mapped as a single field. Given an object, the flattened mapping will parse out its leaf values and index them into one field as keywords. The object's contents can then be searched through simple queries and aggregations.

This data type can be useful for indexing objects with a large or unknown number of unique keys. Only one field mapping is created for the whole JSON object, which can help prevent a mappings explosion from having too many distinct field mappings. Since the flattened field maps an entire object with potentially many subfields as a single field, the response contains the unaltered structure from _source.

Currently, flattened object fields can be used with the following query types:
  - term, terms, and terms_set
  - prefix
  - range
  - match and multi_match
  - query_string and simple_query_string
  - exists

The _source should still be intact no matter indexing or not right? Reason for asking is this data stored in ES will be searched from the FE and when I look for a similar document whose FieldF is truncated and stored 8191 bytes, how will it look like when I open that document from UI use case?

Actually to be more specific I index the same field twice in 2 different formats currently to be able to suffice all full text search scenarios

"FieldF": {
          "type": "flattened"
        },
        "FieldG": {
          "type": "text",
          "fields": {
            "pattern": {
              "type": "text",
              "analyzer": "pattern_analyzer"
            }
          },
          "analyzer": "standard"
        }

At the moment my search query looks for FieldG in the Match and MatchPhrase
sample response that goes to UI

var result = searchResponse.Hits.Select(hit => hit.Source).ToList();
return new Response { Result = result.Select(Transformer.ConvertToResult).ToList() };

here ConvertToResult uses Result.FieldF

Yes, text field does not have this limitation.

No, as you can see in the documentation, each nested field of the top-level object is mapped and indexed separately, but all of them will be mapped as keyword fields, where you have the limitation in the size of the keyword being indexed.

I only use Kibana as the front end, so no Idea how you should deal with this in other tools, but just a correction, as mentioned in the documentation, the field with ignore_above will not be indexed nor stored, it is ignored, this is used to index the rest of the document, without it the entire document will be dropped.

If you want to truncate it, you need to do it before sending the data to be indexed.

Everything will still be present in the _source field.

Thanks @leandrojmp

1 basic question for flattened type fields and how searches work on them
So if I have a document like below where FieldF has multiple key-value pairs

"FieldF": {
            "ae0vi": "test",
            "acbm5": 1,
            "aznde": "www.google.com",
            "aid9r": "127.1.1.1",
            "azxcv": "ABC-90"
          }

So here for flattened, all the top level keys i.e. aznde, aid9r etc. will be mapped like keywords right which means let's say I have few docs with url as www.google.com and few other docs with www.amazon.com and now if I search exact matches for "www.google.com", something like below should work?

GET test-index/_search
{
  "query": {
    "terms": {
      "FieldF": ["www.google.com"]
    }
  }
}

FYI: I can't do a search looking for exact keys like aznde, aid9r since these documents are very dynamic.

Thanks,
Moni

This should work.

It is one of the examples in the documentation.

Querying the top-level flattened field searches all leaf values in the object

Hi @leandrojmp

On the same thread as above after I added "ignore_above": 8191 to the field with flattened mapping (FieldA here was of type object with dynamic key-value pair and could have nested structure as well) things were fine. And like I had said earlier we are doing Fulltextsearch and keyword searches, we had also stored the field as text.

However currently in production we are seeing the exact limit for the text field. What is the best way to resolve this since if we ignore anything above 8191 means data loss for us. Some solutions involves a lot of data, and looks like some of that data can be big. For a full text search standpoint not sure if not indexing data that is over a specific size is ok since that would mean bad customer experience. I think there will always be fields that will have a large amount of data, especially text fields.

Mapping below

"FieldA": {
          "type": "flattened",
          "ignore_above": 8191
        },
        "FieldAAsText": {
          "type": "text",
          "fields": {
            "pattern": {
              "type": "text",
              "analyzer": "pattern_analyzer"
            }
          },
          "analyzer": "standard"
        }

Analyzer details from settings api

"analysis": {
          "filter": {
            "snowball": {
              "type": "snowball",
              "language": "English"
            }
          },
          "analyzer": {
            "pattern_analyzer": {
              "filter": [
                "lowercase",
                "asciifolding"
              ],
              "type": "custom",
              "tokenizer": "special_chars_tokenizer"
            }
          },
          "tokenizer": {
            "special_chars_tokenizer": {
              "pattern": """[\s.,;:!?@]+""",
              "type": "pattern"
            }
          }
        }

correct me if I am wrong, the Lucene 32,766-byte limit is commonly known to apply to object keys (field names) in flattened fields (esp for flattened or object ). However, does it also apply to individual terms (tokens) produced by the analyzer when indexing text fields??

  • The main field (ValuesDocumentAsText) uses the standard analyzer.
  • The subfield (ValuesDocumentAsText.pattern) uses the custom pattern_analyzer.

so both will be used while indexing the doc. and is this correct understanding that if either analyzer produces a single term larger than 32,766 UTF-8 bytes, ES will throw an error??

Error sample:

{ "time": "2025-06-13 13:12:18.6672", "level": "ERROR", "message": "Failed to index document ID: abcd. Error: Request failed to execute. Call: Status code 400 from: PUT /indexA-alias/_doc/abcd?version=7515421116995731518&version_type=external. ServerError: Type: illegal_argument_exception Reason: \"Document contains at least one immense term in field=\"FieldAAsText.pattern\" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: '[34, 109, 103, 97, 97, 97, 100, 122, 121, 111, 103, 56, 49, 101, 115, 102, 104, 119, 98, 101, 107, 121, 108, 105, 110, 108, 112, 98, 115, 120]...', original message: bytes can be at most 32766 in length; got 40486\" CausedBy: \"Type: max_bytes_length_exceeded_exception Reason: \"max_bytes_length_exceeded_exception: bytes can be at most 32766 in length; got 40486\"\"" }

Hi Team,

So to rule out flattened vs text, I created the below dummy index with just the text field and having both the standard and custom pattern analyzer.

PUT /records-text-limit
{
  "settings": {
    "analysis": {
      "filter": {
        "snowball": {
          "type": "snowball",
          "language": "English"
        }
      },
      "analyzer": {
        "pattern_analyzer": {
          "type": "custom",
          "tokenizer": "special_chars_tokenizer",
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        }
      },
      "tokenizer": {
        "special_chars_tokenizer": {
          "type": "pattern",
          "pattern": "[\\s.,;:!?@]+"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "FieldAsText": {
        "type": "text",
        "analyzer": "standard",
        "fields": {
          "pattern": {
            "type": "text",
            "analyzer": "pattern_analyzer"
          }
        }
      }
    }
  }
}

Then I ran the post call to index the same doc from production environment that failed with 400 bad request using - POST /records-text-limit/_doc

I am getting the same error from kibana. Can you please help me understand the error and ways to fix it without data loss??

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": """Document contains at least one immense term in field="FieldAsText.pattern" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: '[92, 34, 109, 103, 97, 97, 97, 100, 122, 121, 111, 103, 56, 49, 101, 115, 102, 104, 119, 98, 101, 107, 121, 108, 105, 110, 108, 112, 98, 115]...', original message: bytes can be at most 32766 in length; got 40488"""
      }
    ],
    "type": "illegal_argument_exception",
    "reason": """Document contains at least one immense term in field="FieldAsText.pattern" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: '[92, 34, 109, 103, 97, 97, 97, 100, 122, 121, 111, 103, 56, 49, 101, 115, 102, 104, 119, 98, 101, 107, 121, 108, 105, 110, 108, 112, 98, 115]...', original message: bytes can be at most 32766 in length; got 40488""",
    "caused_by": {
      "type": "max_bytes_length_exceeded_exception",
      "reason": "max_bytes_length_exceeded_exception: bytes can be at most 32766 in length; got 40488"
    }
  },
  "status": 400
}