Ignore_above for text datatype in elasticsearch

In elasticsearch 5 string datatype has been removed as explained nicely in this blog post.

ignore_above is not supported with text datatype. Can someone let me know how does text field is then protected against Lucene’s term byte-length limit as explained in the documentation here, especially the below section

This option is also useful for protecting against Lucene’s term byte-length limit of 32766 .

Anyone any thoughts on this? Same has been asked in stackoverflow also and no response there also.

Maybe by using the token length filter in an analyzer config?

Thanks @Mark_Harwood for replying.

So this I have to specify in analyzer config or by default elasticsearch does it for text field ?

It's not on by default.
I've not generally had a need for it because most text gets sliced up by tokenizers into smaller tokens anyway based on whitespace, punctuation etc.
Maybe a base64 encoded image might produce a single big token but I've not ran into content like that

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.