What's the right way to implement large text fields?

In ElasticSearch 5.x, the fielddata of text fields are disabled by default. And there are two solutions:

  1. Use the keyword-sub-type as in {"article": {"type":"text", {"fields": {"keyword":{"type": "keyword"}}}}} and then use "title.keyword" in sort, aggregations and queries
  2. Enable fielddata

Solution 1 imposes a maximum size limit of 32766 terms on the field "article" and solution 2 incurs a large memory footprint. Suppose the field "article" is big (imagine 500000+ words), what's the best way to implement this field with low memory footprint, all words can be searched, and no maximum size limit?

Why do you assume search requires fielddata? That's only used for sorting and aggs and I'm not sure you need those?
Also, for very large docs (e.g. a whole book) it sometimes makes sense to break it into multiple docs eg chapters.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.