UTF8 encoding is longer than the max length 32766


#1

I have a requirement to store a text larger than 64K. I dont want to index it, but still while inserting into the index, I see the following exception.

IllegalArgumentException[Document contains at least one immense term in field="kvdatav1" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[86, 78, 49, 66, 55, 81, 86, 115, 74, 100, 85, 120, 98, 112, 73, 122, 67, 78, 77, 71, 116, 81, 107, 53, 65, 79, 90, 120, 119, 120]...', original message: bytes can be at most 32766 in length; got 45000]; nested: MaxBytesLengthExceededException[bytes can be at most 32766 in length; got 45000];

How do I work around the problem. I am using ES version 1.4.2

curl -XDELETE 'http://localhost:9200/test'
curl -XPUT 'http://localhost:9200/test/' -d '
{
"settings": {
"number_of_shards": 5, //Default for number_of_shards is 5
"number_of_replicas": 1, //Default for number_of_replicas is 1
"analysis" : {
"analyzer" : {
"default" : {
"type" : "keyword"
}
}
}
},
"mappings": {
"data" : {
"properties" : {
"kvdatav1" : {"type" : "string", "index" : "no"},"kReq" : {"type" : "string", "index" : "no"},"kResp" : {"type" : "string", "index" : "no"}
}
}
}
}'


How to increase byte size in processing in logstash?
(Mark Walkom) #2

You can try ignore_above: 256

From the docs;

ignore_above
The analyzer will ignore strings larger than this size. Useful for generic not_analyzed fields that should ignore long text.


Not able to process large lines of log data into elasticsearch
(Nik Everett) #3

It looks like you just want the field in the _source and not searchable. If
that's true then you can just set "index": "no" and it won't be searchable
but will still be return-able in the _source.


#4

Tried that as well. However it doesn't seem to be working fine.

curl -XDELETE 'http://localhost:9200/test'

curl -XPUT 'http://localhost:9200/test/' -d '
{
   "settings": {
       "number_of_shards": 5, //Default for number_of_shards is 5
       "number_of_replicas": 1, //Default for number_of_replicas is 1
       "analysis" : {
           "analyzer" : {
               "default" : {
                   "type" : "keyword",
                   "ignore_above" : 256
               }
           }
       }
   },

   "mappings": {
       "data" : {
           "properties" : {
               "kvdatav1" : {"type" : "string", "ignore_above" : 256, "index" : "no"},"kReq" : {"type" : "string", "ignore_above" : 256, "index" : "no"},"kResp" : {"type" : "string", "ignore_above" : 256, "index" : "no"}
           }
       }
   }
}'

(system) #5