[ELASTICSEARCH] UTF8 encoding is longer than the max length 32766


(Benjamin Carriou) #1

Bonjour,

J'essaye d'indexer un xml qui semble dépasser les 32766 caractères autorisés pour un champ.
De ce fait, j'ai l'erreur suivante qui apparaît:

"status"=>400, "error"=>{"type"=>"illegal_argument_exception", "reason"=>"Document contains at least one immense term in field=\"message.raw\" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[109, 97, 105, 32, 50, 51, 44, 32, 50, 48, 49, 54, 32, 50, 58, 51, 51, 58, 48, 50, 32, 80, 77, 32, 87, 101, 98, 104, 97, 110]...', original message: bytes can be at most 32766 in length; got 36033", "caused_by"=>{"type"=>"max_bytes_length_exceeded_exception", "reason"=>"bytes can be at most 32766 in length; got 36033"}}}}, :level=>:warn}

J'ai lu sur des forums qu'on pouvait mettre:
index: no

Cependant, si je comprend bien, ce champ ne sera pas indexé et ne pourra pas être recherché et affiché dans Kibana par exemple ?

Auriez-vous des idées ?

Pour information, le mapping de mon index:
{ "mappings": { "_default_": { "_all": { "enabled": true, "norms": { "enabled": false } }, "dynamic_templates": [ { "strings": { "match_mapping_type": "string", "mapping": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } } } } ], "properties": { "@timestamp": { "type": "date" }, "offset": { "type": "long", "doc_values": "true" }, "geoip" : { "type" : "object", "dynamic": true, "properties" : { "location" : { "type" : "geo_point" } } } } } }, "settings": { "index.refresh_interval": "5s" }, "template": "fnmf-*" }


(system) #2