What's the right way to implement large text fields?


In ElasticSearch 5.x, the fielddata of text fields are disabled by default. And there are two solutions:

  1. Use the keyword-sub-type as in {"article": {"type":"text", {"fields": {"keyword":{"type": "keyword"}}}}} and then use "title.keyword" in sort, aggregations and queries
  2. Enable fielddata

Solution 1 imposes a maximum size limit of 32766 terms on the field "article" and solution 2 incurs a large memory footprint. Suppose the field "article" is big (imagine 500000+ words), what's the best way to implement this field with low memory footprint, all words can be searched, and no maximum size limit?

(Mark Harwood) #2

Why do you assume search requires fielddata? That's only used for sorting and aggs and I'm not sure you need those?
Also, for very large docs (e.g. a whole book) it sometimes makes sense to break it into multiple docs eg chapters.

(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.