Hello,
ES 6.0 and up uses Lucene 7, where doc values and norms have switched from random access API to iterator API. What this means, as I understand it, is that if a document does not have a given field that other docs in the index have, we'll no longer have to pay in disk space for that field.
Given the above, is the advice in ES General Recommendations, under heading Avoid Sparsity, still relevant? It's still there in documentation for ES >=6.0, and says things like:
In practice, this means that if an index has M documents,
norms will require M bytes of storage per field, even for fields
that only appear in a small fraction of the documents of the index.
Although slightly more complex with doc values due to the fact that
doc values have multiple ways that they can be encoded depending
on the type of field and on the actual data that the field stores,
the problem is very similar.
Thank you,
Jan