I have a query about the data size growth in APM as I had heard some concerns on rapid data growth for APM. Can you please answer this for us,
If I have a service with 1k RPS with 6 spans on an average and 5% error rate deployed on 5 boxes, how much data growth can I expect over a day/month if I have sampling rate set to 1 vs data growth if I have sampling rate set to 0?
If you set the sampling rate to 0, then APM will not index any any spans. So with sampling rate of 0 you're eliminating 6000 docs/s.
Error events are unaffected by sampling; APM will index these regardless of the sampling rate.
Currently APM indexes a transaction document per request (1K RPS = 1K transaction docs/s), regardless of sampling. We are working on changing the implementation to store pre-aggregated histograms, so you'll have far fewer documents stored.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.