I have been going through elastic documentation and definitive guide to understand the use of doc values, ordinals and field data. While going through, I was unable to get a clear picture on in which scenario the one is chosen over the other.
The documentation for What are global ordinal says:
To support aggregations and other operations that require looking up field values on a per-document basis, Elasticsearch uses a data structure called doc values. Term-based field types such as
keywordstore their doc values using an ordinal mapping for a more compact representation.
The documentation for doc_values says:
Doc values are the on-disk data structure, built at document index time, which makes this data access pattern possible.
These two statements create a confusion where one says Doc values are the on-disk data structure and the other says Term-based field types such as
keyword store their doc values using an ordinal mapping for a more compact representation
If I go by the above what I understand is that ordinals come into picture to create a more compact representation. This means majorly to save the space or in other words a type of compression.
- Is my above understanding correct or there is something I'm missing or I'm completely misunderstood?
- Will there always exist global ordinals for all the doc values or is it smartly decided by elastic when or when not to create it?
- Does ordinals come into picture only for text/keyword fields or they exist for all the data types against which aggregation is required?
- What stats do I need to check the usage/performance related to ordinals.
Where I come to this, is that I have noticed that the refresh in our cluster is taking more time (around 5 secs) and the reason for this what I'm suspecting is the use of parent-child which use global ordinals. So I want to understand that is this (join) the only cause or other fields can also add to it?