If you are asking about a problem you are experiencing, please use the following template, as it will help us help you. If you have a different problem, please delete all of this text
Kibana version: 7.5.2
Elasticsearch version: 7.5.2
APM Server version: 7.6.0
APM Agent language and version: Java 1.12.0
Our ELK stack and apm-servers are deployed using elastic operator and helm charts in kubernetes. Is there anything special in your setup? We use AWS load balancer in front of APM servers
We have 6 ES data instances each with 15 VCPUs and 30 GIB of RAM, with 15 GIB of HEAP.
We are using EBS volumes, each with 800 GB of storage and 3000 dedicated IOPS.
Description of the problem including expected versus actual behavior. Please include screenshots (if relevant):
Indexing 100 unsampled transactions per second for 1 hour results in 360,000 documents. These documents use around 50 Mb of disk space.
We index around 35000 transactions per second, each hour we send around 126 million documents. At this moment we have 475 million documents in the index which should be around 66GB of data if compressed very well, but the primary index is at 180GB. This is not scalable for us.
I completely agree that storing all unsampled transactions is non-scalable, and we are of course fully aware of that. There were some technical limitations until recently that prevented us from dropping those while still be able to provide accurate histograms.
The introduction of the histogram datatype provided us with the storage solution we needed, and while there are still missing pieces for the query side, we already started working on the architectural change that will allow these storage cost savings you are looking for by dropping unsampled transactions (would probably be opt-in, at least until the next major release). I believe this exactly addresses your concern.
I assume what you did is simply extrapolating from
Indexing 100 unsampled transactions per second for 1 hour results in 360,000 documents.
These documents use around 50 Mb of disk space.
However, you should take into consideration all other notices in this page. Specifically for the compression part, the note at the bottom is very relevant:
These examples were indexing the same data over and over with minimal variation.
Because of that, the compression ratios observed of 80-90% are somewhat optimistic.
You would not get the same compression ratio with a real-world data.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.