Max length allowed for "max_token_length" and how to set value


I would like to set the standard tokenizer to use a token length of 30000 for a field (I am inserting biological sequences into this field) instead of cutting up the string into 255 length tokens as is the default. Can this be done? What is the JSON command to do this?


There is no such JSON command. Expanding token lengths over the default value have massive impact on indexing and search - the default token length is 32k.

What are you trying to do? Search for sequence alignments? If so, there are better solutions: imagine locality sensitive hashes would have to be backported to Elasticsearch 2.3+, in a plugin maybe.

Another approach to consider is to split up the token yourself into smaller chains of importance, and then use a phrase search.

If you really do want only exact matches, you could insert it as a not_analyzed string.