UTF8 encoding is longer than the max length 32766

I have a requirement to store a text larger than 64K. I dont want to index it, but still while inserting into the index, I see the following exception.

IllegalArgumentException[Document contains at least one immense term in field="kvdatav1" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[86, 78, 49, 66, 55, 81, 86, 115, 74, 100, 85, 120, 98, 112, 73, 122, 67, 78, 77, 71, 116, 81, 107, 53, 65, 79, 90, 120, 119, 120]...', original message: bytes can be at most 32766 in length; got 45000]; nested: MaxBytesLengthExceededException[bytes can be at most 32766 in length; got 45000];

How do I work around the problem. I am using ES version 1.4.2

curl -XDELETE 'http://localhost:9200/test'
curl -XPUT 'http://localhost:9200/test/' -d '
{
"settings": {
"number_of_shards": 5, //Default for number_of_shards is 5
"number_of_replicas": 1, //Default for number_of_replicas is 1
"analysis" : {
"analyzer" : {
"default" : {
"type" : "keyword"
}
}
}
},
"mappings": {
"data" : {
"properties" : {
"kvdatav1" : {"type" : "string", "index" : "no"},"kReq" : {"type" : "string", "index" : "no"},"kResp" : {"type" : "string", "index" : "no"}
}
}
}
}'

You can try ignore_above: 256

From the docs;

ignore_above
The analyzer will ignore strings larger than this size. Useful for generic not_analyzed fields that should ignore long text.

It looks like you just want the field in the _source and not searchable. If
that's true then you can just set "index": "no" and it won't be searchable
but will still be return-able in the _source.

1 Like

Tried that as well. However it doesn't seem to be working fine.

curl -XDELETE 'http://localhost:9200/test'

curl -XPUT 'http://localhost:9200/test/' -d '
{
   "settings": {
       "number_of_shards": 5, //Default for number_of_shards is 5
       "number_of_replicas": 1, //Default for number_of_replicas is 1
       "analysis" : {
           "analyzer" : {
               "default" : {
                   "type" : "keyword",
                   "ignore_above" : 256
               }
           }
       }
   },

   "mappings": {
       "data" : {
           "properties" : {
               "kvdatav1" : {"type" : "string", "ignore_above" : 256, "index" : "no"},"kReq" : {"type" : "string", "ignore_above" : 256, "index" : "no"},"kResp" : {"type" : "string", "ignore_above" : 256, "index" : "no"}
           }
       }
   }
}'