Document contains at least one immense term in field="REGIONS" (whose UTF8 e


(LDA) #1

when i build index for my data, a field not analyzed got this problem,who's length is too long?
i fix it with ignore_above but not seems better,the issue append again!
this is my index mapping:
.startObject("REGIONS")
.field("type","string")
.field("store","yes")
.field("index","not_analyzed")
.field("ignore_above","100000")
.endObject()
the exception like this:


(Joshua Rich) #2

You are hitting Lucene’s term byte-length limit of 32766 for this field. Note that the 32766 is a byte-length limit, while ignore_above setting is a character count limit. This is an important distinction because depending on your text, you may require multiple bytes to store a single character.

So you should really set ignore_above much lower, I'd suggest a realistic character count that will stop the extreme cases from being indexed. However, you may be better off just setting index: no for this field, so it is not searchable at all (but can be retrieved in the results).


(LDA) #3

Thanks,i set it to 256 and solve this problem!


(system) #4