APM agents storing performance data when wifi is off

Hi everyone,

Question:

  1. Is there any way for APM agents to collect performance information while the machine doesn't have access to wifi connection?

  2. Would APM remember the performance logs based on their timestamp and report them back once the machine is connected to wifi connection?

I appreciate any feedback/reference, thanks! :grinning:

Hi @EZprogramming!

This is not something that the agents will do, but what you could do is run the APM Server on the same host. The server has its own in-memory queue which, depending on the amount of data that would accumulate, may be sufficient. Alternatively you could:

  • use the file spool queue. Please note that this is beta functionality. The link I provided was the Filebeat, but this configuration is available to APM Server also (just isn't documented there).
  • run Logstash on the same machine, point APM Server at that, and Logstash at Elasticsearch. Logstash has its own built-in queuing, which can also store to disk.

Using the above approach, the events would be queued and then sent to Elasticsearch when connected to the network. The events would have the same content as if they were sent immediately to Elasticsearch.

May I ask what kind of application(s) you're instrumenting?

1 Like

Hi @axw, I appreciate your detailed explanation, it is really helpful! :+1:

I am doing a research project, I have a computer (residing in different regions) that might or might not be connected to wifi depending on the region's location and time of day. I want to use APM to track the performance metrics for this computer even when the wifi is off for a period of time, and then when the wifi is back on, I want to receive the performance metrics.

I liked your approach for collecting the logs using the in-memory queue of the APM Server, and later uploading the data to the Elasticsearch Cloud, but I forgot to mention about Performance metrics as well, I know performance metrics are different from logs, so I apologize for missing that in my first quetsion.

I appreciate if you can confirm these with me.

  1. If I am only dealing with performance metrics, how can I send this back from a computer located in a different region to where I am living? Can APM Server do that?

  2. When the machine is offline (no wifi), can APM Server collect performance metrics without losing any metrics throughout the collection process?

  3. Is there any timestamp on those performance metrics? Let's say if CPU usage goes up at 11:00am, and then there is a connection at 6:00pm, can it send the CPU usage metrics for 11:00am?

  4. I know that I can adjust APM Server memory usage to increase queue.mem.events, but can APM Server collect performance data for many hours in a row? like from 8:00am - 6:00pm continously without failing or restarting?

It's not entirely clear to me that you need APM; maybe Metricbeat would be more appropriate here? Metricbeat can be used to periodically measure CPU/memory/disk usage, and similar things. APM agents do some of this, but APM is mostly intended for web applications.

Nevertheless, the answers are essentially the same, as Metricbeat and the APM Server share an underlying code base.

The answer for APM trace data, logs, and metrics is the same. They are all recorded in memory in the same sort of way.

That depends on whether the queue has enough capacity. Assuming it has capacity (enough memory or disk space, depending on whether you use an in-memory or on-disk queue), then you also need to consider how reliable you need it to be. For guaranteed delivery, you'll probably want some kind of on-disk queue.

Yes, reported metrics have associated timestamps, which is the time at which the metric value was captured - 11:00am in your example.

Yes, but again that depends on the capacity. If the amount of data sent by agents exceeds the queue capacity, then new data will be rejected until the server is able to connect.

1 Like

@axw, thanks again for your detailed response! I am sure your response definitely helps a lot of people out regarding the same problem.

Yes, you are right. Metricbeat has a specific module specifically for systems and clouds which is what I intend to work with, so it is great! :ok_hand:

I understand there are configuration files and modules that I have to setup, but my concern is where exactly should I run my metricbeat? For example, for APM, I integrate the Apm agent within my Python web app, few lines of code at the top of my file, and that's it, I receive my performance metrics through APM server in the Cloud. For metricbeats, I assume I have to run it somewhere forever, and that's my problem. I don't have wifi connection at all times, so if I run Metricbeat on a docker in EC2, it needs wifi connection. How can I run metricbeat with no reliable wifi for many hours?

Interesting, I realized I have to have a server running in order for my APM agents to transfer performance metrics to APM server in the Elastic Cloud. Is there any way to run APM agents without having a local web app or server web app always running? For example, just a Python file with no Flask/Django web server.

If you just want system metrics, e.g. CPU/memory/disk/network usage, then run Metricbeat on the machine itself.

The answers I gave earlier regarding queuing of data in the APM Server apply to Metricbeat as well.

If you have more questions about Metricbeat, please refer to https://www.elastic.co/guide/en/beats/metricbeat/current/index.html and/or open a new topic under https://discuss.elastic.co/c/infrastructure.

You can run the agent without a web service, but if your goal is only to gather system metrics then you'd probably be better off just installing Metricbeat. You'll get more detailed system metrics that way. APM is intended primarily for answering questions about application performance, like "how fast is my web service responding to users?"

1 Like

@axw, thanks for your explanation. I got all the information I needed.

1 Like

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.