APM dropping metrics

If you are asking about a problem you are experiencing, please use the following template, as it will help us help you. If you have a different problem, please delete all of this text :slight_smile:

Kibana version: 6.7.0

Elasticsearch version: 6.7.0

APM Server version: 6.7.0

APM Agent language and version: Python - elastic-apm==4.2.1

Browser version:

Original install method (e.g. download page, yum, deb, from source, etc.) and version: YUM

Fresh install or upgraded from other version? Fresh

Is there anything special in your setup? For example, are you using the Logstash or Kafka outputs? Are you using a load balancer in front of the APM Servers? Have you changed index pattern, generated custom templates, changed agent configuration etc.

ec2 instance running ElasticSearch
ec2 instance running both APM server and Kibana
2 ec2 instances running the project with agent installed

Description of the problem including expected versus actual behavior. Please include screenshots (if relevant):

We noticed that there are flat lines in our metrics as of recently, nothing changed except updating the agent from 3.0.2 to 4.2.1. But the updated agent reported properly. All I can offer as of now is:
20
Here you can see a wider period of time like a month (update happened 2 months ago)
39
Closer look
52
Here you can see the gaps that are going on, almost like if the TPM doesn't exceed a limit it stops reporting.

Another thing I notices is that we have N/A requests reported and for the life of me I cannot see what those are:

I'm looking for advice as to what to look at as there are multiple components that could be an issue. I'm not sure what could have changed as all the components worked and we did not change the configuration or other parts of the system that worked a month without any issue.

Steps to reproduce:
1.
2.
3.

Errors in browser console (if relevant):

Provide logs and/or server output (if relevant):

Not sure what is relevant.

Hello,

Sorry that you are experiencing this. In order to confirm our rule out an APM Server issue, can you provide server logs?
Do you see 4XX or 5XX HTTP errors (eg: queue is full)?

Juan

Hi @borko

in addition to server logs, it would be super helpful to know if the agent logs any problems. If you haven't set up logging for your Django app yet, you can add something like this to your settings.py file:

LOGGING = {
    'version': 1,
    'disable_existing_loggers': False,
    'formatters': {
        'verbose': {
            'format': '%(levelname)s %(asctime)s %(name)s %(process)d %(thread)d %(message)s'
        },
        'simple': {
            'format': '%(levelname)s %(message)s'
        },
    },
    'handlers': {
        'console': {
            'class': 'logging.StreamHandler',
            'formatter': 'verbose',
        },
    },
    'loggers': {
        'django': {
            'handlers': ['console'],
            'level': os.getenv('DJANGO_LOG_LEVEL', 'INFO'),
        },
        'elasticapm': {
            'handlers': ['console'],
            'level': os.getenv('DJANGO_LOG_LEVEL', 'INFO'),
        },
    },
}

You should start seeing log messages on your console now (or wherever you configured STDOUT to be directed to).

Hi guys,

@jalvz and @beniwohli

Thanks for the help, I manage to see that the disk filled up and landed on expanding the disk space.

The thing that I did not realize is that I had to run this:

curl -XPUT -H "Content-Type: application/json" http://localhost:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'

In order to reset the indexes not to be read-only.

I think this topic can be closed and I really want to thank you for steering me in the right direction.

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.