We’re running into a peculiar issue when updating indexes with content for
the document.
"document contains at least one immense term in (whose utf8 encoding is
longer than the max length 32766), all of which were skipped. please
correct the analyzer to not produce such terms”
I’m hoping that there’s a simple fix or setting that can resolve this.
All in all, it hints at some strange data, because such "immense" term
shouldn't probably be in the index in the first place.
Karel
On Thursday, May 29, 2014 10:47:37 PM UTC+2, Jeff Dupont wrote:
We’re running into a peculiar issue when updating indexes with content for
the document.
"document contains at least one immense term in (whose utf8 encoding is
longer than the max length 32766), all of which were skipped. please
correct the analyzer to not produce such terms”
I’m hoping that there’s a simple fix or setting that can resolve this.
All in all, it hints at some strange data, because such "immense" term
shouldn't probably be in the index in the first place.
Karel
On Thursday, May 29, 2014 10:47:37 PM UTC+2, Jeff Dupont wrote:
We’re running into a peculiar issue when updating indexes with content
for the document.
"document contains at least one immense term in (whose utf8 encoding is
longer than the max length 32766), all of which were skipped. please
correct the analyzer to not produce such terms”
I’m hoping that there’s a simple fix or setting that can resolve this.
All in all, it hints at some strange data, because such "immense" term
shouldn't probably be in the index in the first place.
Karel
On Thursday, May 29, 2014 10:47:37 PM UTC+2, Jeff Dupont wrote:
We’re running into a peculiar issue when updating indexes with content
for the document.
"document contains at least one immense term in (whose utf8 encoding is
longer than the max length 32766), all of which were skipped. please
correct the analyzer to not produce such terms”
I’m hoping that there’s a simple fix or setting that can resolve this.
How does this MAX_LENGTH restriction impact on a custom_all field where we
may be copying data from different fields using some analyzer.
Is the MAX_LENGTH restriction also applicable on such custom_all field
which in turn implies that in such a case cumulative length is what matters.
amish
On Thursday, October 30, 2014 3:43:26 AM UTC-7, Rotem wrote:
+1 on this question.
If the error is generated because of a not_analyzed field, how is it
possible to instruct ES to drop these values instead of failing the request?
On Tuesday, July 1, 2014 10:22:54 PM UTC+3, Andrew Mehler wrote:
For not analyzed fields, Is there a way of capturing the old behavior?
From what I can tell, you need to specify a tokenizer to have a token
filter.
On Tuesday, June 3, 2014 12:18:37 PM UTC-4, Karel Minařík wrote:
All in all, it hints at some strange data, because such "immense" term
shouldn't probably be in the index in the first place.
Karel
On Thursday, May 29, 2014 10:47:37 PM UTC+2, Jeff Dupont wrote:
We’re running into a peculiar issue when updating indexes with content
for the document.
"document contains at least one immense term in (whose utf8 encoding is
longer than the max length 32766), all of which were skipped. please
correct the analyzer to not produce such terms”
I’m hoping that there’s a simple fix or setting that can resolve this.
The max length restriction is per token so its unlikely you'll see it
unless use not_analyzed fields. You can work around it by setting the
ignore_above option on the string type. That'll just throw away the token.
Nik
How does this MAX_LENGTH restriction impact on a custom_all field where we
may be copying data from different fields using some analyzer.
Is the MAX_LENGTH restriction also applicable on such custom_all field
which in turn implies that in such a case cumulative length is what matters.
amish
On Thursday, October 30, 2014 3:43:26 AM UTC-7, Rotem wrote:
+1 on this question.
If the error is generated because of a not_analyzed field, how is it
possible to instruct ES to drop these values instead of failing the request?
On Tuesday, July 1, 2014 10:22:54 PM UTC+3, Andrew Mehler wrote:
For not analyzed fields, Is there a way of capturing the old behavior?
From what I can tell, you need to specify a tokenizer to have a token
filter.
On Tuesday, June 3, 2014 12:18:37 PM UTC-4, Karel Minařík wrote:
All in all, it hints at some strange data, because such "immense" term
shouldn't probably be in the index in the first place.
Karel
On Thursday, May 29, 2014 10:47:37 PM UTC+2, Jeff Dupont wrote:
We’re running into a peculiar issue when updating indexes with content
for the document.
"document contains at least one immense term in (whose utf8 encoding is
longer than the max length 32766), all of which were skipped. please
correct the analyzer to not produce such terms”
I’m hoping that there’s a simple fix or setting that can resolve this.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.