I'm currently working on optimizing the size of an index. After reviewing the documentation and analyzing the field sizes, I found that a significant portion of the document size is attributed to the _source field. In an attempt to reduce the size of this index, I disabled the _source field. However, I noticed that after disabling it, none of the other fields except for _idx and some default fields are visible in Kibana.
Now I'm wondering if it's possible to disable the _source field without losing the visibility of all the other fields. Or perhaps I have misunderstood the purpose of the _source field in the context of reducing the index size.
Disabling the _source field is not recommended and there is a warning in the documentation of the impact it can have.
Which version of Elasticsearch are you using? Depending on the version you are using, you could consider using synthetic_source which means the source field is regenerated. Just a warning that as of 8.8 it is still in technical preview for non-TSDB indices.
I'm not sure you will see a noticeable difference when you are talking about megs/hundreds of documents, and you'd be better off testing gigs/millions of documents.
Whenever you are comparing storage size on disk I recommend forcemerging down to a single segment before comparing as the size on disk can vary a lot for small data volumes if segment count and distribution differs.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.