Ignore_above for text datatype in elasticsearch

Deb · August 17, 2019, 4:06pm

In elasticsearch 5 string datatype has been removed as explained nicely in this blog post.

ignore_above is not supported with text datatype. Can someone let me know how does text field is then protected against Lucene’s term byte-length limit as explained in the documentation here, especially the below section

This option is also useful for protecting against Lucene’s term byte-length limit of 32766 .

Deb · August 20, 2019, 9:46am

Anyone any thoughts on this? Same has been asked in stackoverflow also and no response there also.

Mark_Harwood · August 20, 2019, 10:01am

Maybe by using the token length filter in an analyzer config?

Deb · August 20, 2019, 10:30am

Thanks @Mark_Harwood for replying.

So this I have to specify in analyzer config or by default elasticsearch does it for text field ?

Mark_Harwood · August 20, 2019, 10:33am

It's not on by default.
I've not generally had a need for it because most text gets sliced up by tokenizers into smaller tokens anyway based on whitespace, punctuation etc.
Maybe a base64 encoded image might produce a single big token but I've not ran into content like that

system · September 17, 2019, 10:34am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How big a field can be Elasticsearch	3	23022	December 28, 2018
String mapping: ignore_above for not_analyzed fields Elasticsearch	4	6887	July 5, 2017
Ignore_above is not working with analyzer and multi fields Elasticsearch	3	547	August 28, 2018
Document contains at least one immense term in field="REGIONS" (whose UTF8 e Elasticsearch	3	1537	July 5, 2017
Ignore_above behavior Elasticsearch	3	1031	September 19, 2017

Ignore_above for text datatype in elasticsearch

Related topics