I want to know what the ratio is between raw data and ingested data that is stored in Elastic cluster.
I know raw logs and ingested logs are not same in size.
Also is there any way to find the incoming raw log volume to the cluster?
I could calculate ingested log volume using index sizes.
There is no fixed ratio as it will depend entirely on how your data looks like and your mapping.
You will need to test yourself with the data you are planning to index and the mapping.
This blog post is a little old, but explains a little how compression works in Elasticsearch, note that things already improved on recent versions.
Thank you for the reply.
Actually I don't need exact ratio. I want to get the raw log data volume per day for Elastic On-prem.
Is there any way to get the raw log volume, which would be great?
I'm using Elasticsearch 8.1.2
If you install the mapper size plugin you can use this to aggregate on the ingested message size and get an estimate that way.