Error: Document contains at least one immense term in field


(Hilal) #1

Hi,

I am trying mapping and indexing with ES-5.0.0 and PHP. I did mapping on kibana and I started indexing with my php codes.(http://192.168.1.35:8080/indexleme.php?p=index)
It indexed but then it stopped.
My error:

  Fatal error:  Uncaught exception 
    'Elasticsearch\Common\Exceptions\BadRequest400Exception' with message 
    '{"error":{"root_cause":[{"type":"remote_transport_exception","reason":"[NchqMHF][127.0.0.1:9300][indices:data/write/index[p]]"}],"type":"illegal_argument_exception","reason":"Document
     contains at least one immense term in field=\"ParaBirim\" (whose UTF8 
    encoding is longer than the max length 32766), all of which were 
    skipped.  Please correct the analyzer to not produce such terms.  The 
    prefix of the first immense term is: '[32, 75, 71, 32, 72, -60, -80, 68,
     82, 79, 70, -60, -80, 76, 32, 80, 65, 77, 85, 75, 32, 82, 85, 76, 79, 
    32, 49, 48, 48, 48]...', original message: bytes can be at most 32766 in
     length; got 
    35053","caused_by":{"type":"max_bytes_length_exceeded_exception","reason":"bytes
     can be at most 32766 in length; got 35053"}},"status":400}' in 
    /home/admin/web/localhost.example.com/public_html/vendor/elasticsearch/elasticsearch/src/Elasticsearch/Connections/Connection.php:681
    Stack
     trace:
    #0 /home/admin/web/localho in /home/admin/web/localhost.example.com/public_html/vendor/elasticsearch/elasticsearch/src/Elasticsearch/Connections/Connection.php on line 682

Problem part of code on mapping:

PUT titub3/ihale/_mapping
{
  "properties": {
    "ParaBirim" : {
      "type" : "keyword"
    }
  }
}

ParaBirim : Values for this field:
YTL
TL,
1.035,00
..etc

What kind of this mapping should be? keyword or float or others?


How to find duplicate documents containing super long text fields?
(Adrien Grand) #2

Lucene doesn't allow terms that contain more than 32k bytes. You could work around this issue by setting a limit on the length of your keyword fields, eg.

PUT titub3/ihale/_mapping
{
  "properties": {
    "ParaBirim" : {
      "type" : "keyword",
      "ignore_above": 10000
    }
  }
}

This will make Elasticsearch ignore all terms whose UTF8 representation would be more than 10k bytes.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.