APM transaction data not compressed properly

Rakesh_B · March 18, 2020, 5:45am

If you are asking about a problem you are experiencing, please use the following template, as it will help us help you. If you have a different problem, please delete all of this text

Kibana version: 7.5.2

Elasticsearch version: 7.5.2

APM Server version: 7.6.0

APM Agent language and version: Java 1.12.0

Our ELK stack and apm-servers are deployed using elastic operator and helm charts in kubernetes.
Is there anything special in your setup? We use AWS load balancer in front of APM servers

We have 6 ES data instances each with 15 VCPUs and 30 GIB of RAM, with 15 GIB of HEAP.

We are using EBS volumes, each with 800 GB of storage and 3000 dedicated IOPS.

Description of the problem including expected versus actual behavior. Please include screenshots (if relevant):

Our APM transaction data doesn't seem to compress very well. According to the documentation here (https://www.elastic.co/guide/en/apm/server/current/sizing-guide.html)

Indexing 100 unsampled transactions per second for 1 hour results in 360,000 documents. These documents use around 50 Mb of disk space.

We index around 35000 transactions per second, each hour we send around 126 million documents. At this moment we have 475 million documents in the index which should be around 66GB of data if compressed very well, but the primary index is at 180GB. This is not scalable for us.

Please let us know what we can do to cut down the disk space.

Thank you

Rakesh_B · March 18, 2020, 5:55am

We use different indices for different types of APM data and our transaction sample rate it very low "0.000005"

Eyal_Koren · March 18, 2020, 7:39am

Hi and thanks for the question!

I completely agree that storing all unsampled transactions is non-scalable, and we are of course fully aware of that. There were some technical limitations until recently that prevented us from dropping those while still be able to provide accurate histograms.
The introduction of the histogram datatype provided us with the storage solution we needed, and while there are still missing pieces for the query side, we already started working on the architectural change that will allow these storage cost savings you are looking for by dropping unsampled transactions (would probably be opt-in, at least until the next major release). I believe this exactly addresses your concern.

I assume what you did is simply extrapolating from

Indexing 100 unsampled transactions per second for 1 hour results in 360,000 documents. 
These documents use around 50 Mb of disk space.

However, you should take into consideration all other notices in this page. Specifically for the compression part, the note at the bottom is very relevant:

These examples were indexing the same data over and over with minimal variation. 
Because of that, the compression ratios observed of 80-90% are somewhat optimistic.

You would not get the same compression ratio with a real-world data.

I hope this helps.

Rakesh_B · March 18, 2020, 2:16pm

Thank you, I guess I'll wait for those features to be released.

system · April 8, 2020, 10:16am

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to use less storage space for APM index? APM	5	531	August 9, 2019
Transaction sample rate has no effect on disk size APM java	4	528	April 1, 2020
Sampled transactions are not found in ElasticAPM UI or Elastic index APM java	3	372	November 4, 2020
APM queue is full APM server	12	5460	April 5, 2020
Transactions tab empty but data with `transaction.id` available in `apm-*` indices APM python , ui	8	2037	February 5, 2020

APM transaction data not compressed properly

Related topics