The MuleSoft system where we had the transactions latency issue is managed by another team from another company, so, before we file a GitHub in order to speed things up we will have to ask them in advance for the information you may need to make sense of the situation.
We have prepared this list of items that we think would be relevant to your analysis; please feel free to add and/or modify as necessary, and we will immediately forward the request to the MuleSoft team on your behalf.
List of Mulesoft components installed
for each module, number of Virtual Machines where it is installed (and what modules share the same VM)
for each Virtual Machine: OS version, JVM version, Disk size, Memory size, number of cores, etc;
for each Virtual Machine: current resources (disk, memory, CPU) usage (after removal of the Agent);
for each Virtual Machine: resources usage (disk, memory, CPU) when the issue occurred;
On "our" side of the integration, the Elastic stack is configured as follows:
all modules are Elastic version 7.12.1 (Platinum Licence), running on the default OpenJDK 11 JVM on Debian 10 Virtual Machines on Google Cloud Platform;
standalone (legacy) APM Server binary (1 VM, shared with Heartbeat);
Heartbeat (1 VM, shared with the APM Server);
Logstash (4 VMs, shared with Kibana);
Kibana (4 VMs, shared with Logstash);
Elasticsearch (10 VMs);
The APM Server is currently monitoring 4 other external applications via the standard Java APM agent, with no issues either side;
THANK YOU for taking the time to look into the issue and apply improvements.
The Mulesoft team sent us some information about their Production environment this morning, if that may still be of use to you.
Their application runs on Mulesoft Runtime Standalone (version 3.8.4), installed on a total of 10 CentOS Linux machines (release 7.9.2009), each equipped with 8 CPUs, 48 GB RAM and 80 GB HD. The 10 nodes are organised as follows:
a Cluster with 2 nodes (Java 1.8.0_191);
a Server Group with 6 nodes (Java 1.8.0_322);
2 backup nodes (Java 1.8.0_322);
We are going to suggest that they try to install the improved v1.21.1 of your Agent, and see if the latency issue is solved.
However, they also added that they are preparing to migrate the whole application to the later Mulesoft RTF Runtime 3.9.5 version.
The RTF Runtime would run on Java 1.8.0_282 on a total of 18 RHEL 8.4 machines, each equipped with 2 CPUs, 16 GB RAM. The nodes will be organised as follows:
3 Controller nodes, with 3 HD each: 80 GB (OS), 250 GB (Docker), 60 GB (etcd);
15 Worker nodes, with 2 HD each: 80 GB (OS), 250 GB (Docker);
Would your elastic-apm-mule3-agent v1.21.1 be still compatible with that?