_source for metricbeat index is using a lot of space. Index size is 15.2gb and _source filed is 12.2gb. And from 4 kubernetes nodes in one day total size of metricbeat size is 160 gb. Any ideas why metricbeat index size so big and how to reduce it?
_source is just one of those things that currently just takes up space. You didn't mention which version of Elasticsearch/Metricbeat you're currently using, but you can take a look at: Tune for disk usage | Elasticsearch Guide [8.3] | Elastic to see if there are any changes you could make to improve this.
Two things to note, there are 2 features being worked on on the Elasticsearch side that I think will greatly improve storage efficiency in the future:
- TSDB - Add better support for metric data types (TSDB) · Issue #74660 · elastic/elasticsearch · GitHub this from what I can tell will allow for better optimization of storing metrics (like data from Metricbeat)
- Synthetic Source - Synthetic Source · Issue #86603 · elastic/elasticsearch · GitHub this seems like it would effectively remove the overhead of
_sourcewhile not having all the drawbacks of fully disabling
Synthetic source by nik9000 · Pull Request #85649 · elastic/elasticsearch · GitHub there is a
perf numberssection in the PR which seems to show significant reductions in disk usage
- Synthetic source by nik9000 · Pull Request #85649 · elastic/elasticsearch · GitHub there is a
Note: Both of the above changes I don't think are GA in any current release of Elasticsearch.
I am using version 8.3.2. Thank you for link. I will take a look