Flattened field contains one immense field whose keyed encoding is longer than the allowed max length of 32766 bytes

Hi Team,

We are using Elasticsearch with dotnet client version as

PackageReference Include="Elastic.Clients.Elasticsearch" Version="8.17.1"

We have an index in elastic where the mapping looks like below

"mappings": {
      "dynamic": "false",
      "properties": {
        "FieldA": {
          "type": "keyword"
        },
        "FieldB": {
          "type": "boolean"
        },
        "FieldC": {
          "type": "text"
        },
        "FieldD": {
          "type": "float"
        },
        "FieldF": {
          "type": "flattened"
        },
        "FieldG": {
          "type": "text",
          "fields": {
            "pattern": {
              "type": "text",
              "analyzer": "pattern_analyzer"
            }
          },
          "analyzer": "standard"
        }
      }
    }

While indexing a document we are getting the below error

"exception": "Elastic.Transport.TransportException: Request failed to execute. Call: Status code 400 from: PUT /indexA-alias/_doc/abc123?version=7486156421021565048&version_type=external. ServerError: Type: document_parsing_exception Reason: \"[1:22532] failed to parse field [FieldF] of type [flattened] in document with id 'abc123'

FieldF is of type Flattened. And looks like there is a limitation which gives below error

CausedBy: \"Type: illegal_argument_exception Reason: \"Flattened field [FieldF] contains one immense field whose keyed encoding is longer than the allowed max length of 32766 bytes. Key length: 5, value length: 137627 for key starting with [a20pq]\"\"\n   at Elastic.Transport.DistributedTransport`1.HandleTransportException(BoundConfiguration boundConfiguration, Exception clientException, TransportResponse response)\n   at Elastic.Transport.DistributedTransport`1.FinalizeResponse[TResponse](Endpoint endpoint, BoundConfiguration boundConfiguration, PostData postData, RequestPipeline pipeline, DateTimeOffset startedOn, Int32 attemptedNodes, Auditor auditor, List`1 seenExceptions, TResponse response)\n   at Elastic.Transport.DistributedTransport`1.RequestCoreAsync[TResponse](Boolean isAsync, EndpointPath path, PostData data, Action`1 configureActivity, IRequestConfiguration localConfiguration, CancellationToken cancellationToken)

What could be the possible solution here? We are using ES for our product's keyword searches where we have both exact and partial matches.

Thanks,
Moni

Use the ignore_above mapping parameter to ignore leaf fields that are larger than the specified value.

Something like this:

        "FieldF": {
          "type": "flattened",
          "ignore_above": 8191
        }

The value for ignore_above is the character count , but Lucene counts bytes. If you use UTF-8 text with many non-ASCII characters, you may want to set the limit to 32766 / 4 = 8191 since UTF-8 characters may occupy at most 4 bytes.

Thanks @leandrojmp
One of our solution involves a lot of data, and looks like some of that data can be big. From a full text search standpoint we can probably not index data that is over a specific size.
There won't be similar length restrictions for a "text" type field right?

"FieldG": {
          "type": "text"
}

Is the flattened field just a large string of text from the data in FieldF?
For the flattened data type in itself, this is what the documentation says:

By default, each subfield in an object is mapped and indexed separately. If the names or types of the subfields are not known in advance, then they are mapped dynamically.

The flattened type provides an alternative approach, where the entire object is mapped as a single field. Given an object, the flattened mapping will parse out its leaf values and index them into one field as keywords. The object's contents can then be searched through simple queries and aggregations.

This data type can be useful for indexing objects with a large or unknown number of unique keys. Only one field mapping is created for the whole JSON object, which can help prevent a mappings explosion from having too many distinct field mappings. Since the flattened field maps an entire object with potentially many subfields as a single field, the response contains the unaltered structure from _source.

Currently, flattened object fields can be used with the following query types:
  - term, terms, and terms_set
  - prefix
  - range
  - match and multi_match
  - query_string and simple_query_string
  - exists

The _source should still be intact no matter indexing or not right? Reason for asking is this data stored in ES will be searched from the FE and when I look for a similar document whose FieldF is truncated and stored 8191 bytes, how will it look like when I open that document from UI use case?

Actually to be more specific I index the same field twice in 2 different formats currently to be able to suffice all full text search scenarios

"FieldF": {
          "type": "flattened"
        },
        "FieldG": {
          "type": "text",
          "fields": {
            "pattern": {
              "type": "text",
              "analyzer": "pattern_analyzer"
            }
          },
          "analyzer": "standard"
        }

At the moment my search query looks for FieldG in the Match and MatchPhrase
sample response that goes to UI

var result = searchResponse.Hits.Select(hit => hit.Source).ToList();
return new Response { Result = result.Select(Transformer.ConvertToResult).ToList() };

here ConvertToResult uses Result.FieldF

Yes, text field does not have this limitation.

No, as you can see in the documentation, each nested field of the top-level object is mapped and indexed separately, but all of them will be mapped as keyword fields, where you have the limitation in the size of the keyword being indexed.

I only use Kibana as the front end, so no Idea how you should deal with this in other tools, but just a correction, as mentioned in the documentation, the field with ignore_above will not be indexed nor stored, it is ignored, this is used to index the rest of the document, without it the entire document will be dropped.

If you want to truncate it, you need to do it before sending the data to be indexed.

Everything will still be present in the _source field.

Thanks @leandrojmp

1 basic question for flattened type fields and how searches work on them
So if I have a document like below where FieldF has multiple key-value pairs

"FieldF": {
            "ae0vi": "test",
            "acbm5": 1,
            "aznde": "www.google.com",
            "aid9r": "127.1.1.1",
            "azxcv": "ABC-90"
          }

So here for flattened, all the top level keys i.e. aznde, aid9r etc. will be mapped like keywords right which means let's say I have few docs with url as www.google.com and few other docs with www.amazon.com and now if I search exact matches for "www.google.com", something like below should work?

GET test-index/_search
{
  "query": {
    "terms": {
      "FieldF": ["www.google.com"]
    }
  }
}

FYI: I can't do a search looking for exact keys like aznde, aid9r since these documents are very dynamic.

Thanks,
Moni

This should work.

It is one of the examples in the documentation.

Querying the top-level flattened field searches all leaf values in the object