Kibana version:
7.5.2
Elasticsearch version:
7.5.2
APM Server version:
7.5.2
APM Agent language and version:
python elastic-apm 5.5.2 elasticsearch 7.5.1
I have two server machines. One of them is for elastic and the other one is for my flask application. This is how we set up our elastic server:
version: '2.2'
services:
es01:
image: docker.elastic.co/elasticsearch/elasticsearch:$ELASTIC_VERSION
container_name: es01
environment:
- node.name=es01
- cluster.name=es-docker-cluster
- discovery.seed_hosts=es02,es03
- cluster.initial_master_nodes=es01,es02,es03
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- esdata01:/usr/share/elasticsearch/data
ports:
- 9200:9200
networks:
- elastic
es02:
image: docker.elastic.co/elasticsearch/elasticsearch:$ELASTIC_VERSION
container_name: es02
environment:
- node.name=es02
- cluster.name=es-docker-cluster
- discovery.seed_hosts=es01,es03
- cluster.initial_master_nodes=es01,es02,es03
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- esdata02:/usr/share/elasticsearch/data
networks:
- elastic
es03:
image: docker.elastic.co/elasticsearch/elasticsearch:$ELASTIC_VERSION
container_name: es03
environment:
- node.name=es03
- cluster.name=es-docker-cluster
- discovery.seed_hosts=es01,es02
- cluster.initial_master_nodes=es01,es02,es03
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- esdata03:/usr/share/elasticsearch/data
networks:
- elastic
kibana:
container_name: kibana
image: docker.elastic.co/kibana/kibana:$ELASTIC_VERSION
environment:
- ELASTICSEARCH_HOSTS=http://es01:9200
ports:
- "5600:5601"
networks:
- elastic
depends_on:
- es01
apm-server:
container_name: apm-server
image: store/elastic/apm-server:$ELASTIC_VERSION
user: apm-server
ports:
- "7200:7200"
depends_on: ["es01", "kibana"]
#volumes:
# - ./apm-conf/apm-server.yml:/usr/share/apm-server/apm-server.yml
command: /usr/share/apm-server/apm-server -e -c /usr/share/apm-server/apm-server.yml -E apm-server.host=apm-server:7200 --strict.perms=false -E output.elasticsearch.hosts=["es01:9200"] -E setup.kibana.host="kibana:5600"
networks:
- elastic
volumes:
esdata01:
external: true
esdata02:
external: true
esdata03:
external: true
networks:
elastic:
driver: bridge
We use it to index city names, locations, and so on. We add ElasticAPM
to our flask app to index the logs of our application in elastic too.
Everything was fine until last week. Both machines were shut down for about three days due to displacement. When we turned on the servers, I start the dockers and everything came back to normal except apm-server
. I saw this error in the log of my flask app:
services_1 | ERROR:elasticapm.transport:Failed to submit message: 'HTTP 503: {"accepted":0,"errors":[{"message":"queue is full"}]}\n'
services_1 | Traceback (most recent call last):
services_1 | File "/usr/local/lib/python3.6/site-packages/elasticapm/transport/base.py", line 224, in _flush
services_1 | self.send(data)
services_1 | File "/usr/local/lib/python3.6/site-packages/elasticapm/transport/http.py", line 105, in send
services_1 | raise TransportException(message, data, print_trace=print_trace)
services_1 | elasticapm.transport.base.TransportException: HTTP 503: {"accepted":0,"errors":[{"message":"queue is full"}]}
I searched for this error and I find https://www.elastic.co/guide/en/apm/server/master/common-problems.html#queue-full. But I didn't understand how I could solve this problem. I added the output.elasticsearch.bulk_max_size=5120
to the command section of apm-server
and restart apm-server
container. After that, a new error appeared:
services_1 | WARNING:urllib3.connectionpool:Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f31517d53c8>: Failed to establish a new connection: [Errno 111] Connection refused',)': /intake/v2/events
services_1 | ERROR:elasticapm.transport:Failed to submit message: 'Connection to APM Server timed out (url: http://192.168.49.37:7200/intake/v2/events, timeout: 5 seconds)'
Meanwhile, I see this in the log of apm-server
:
apm-server | 2020-06-07T13:06:49.514Z INFO [request] middleware/log_middleware.go:76 request accepted {"request_id": "f4e3fd78-169e-4c89-a288-d8f2f1c35ed5", "method": "POST", "URL": "/intake/v2/events", "content_length": 714, "remote_address": "192.168.49.35", "user-agent": "elasticapm-python/5.5.2", "response_code": 202}
apm-server | 2020-06-07T13:07:10.023Z INFO [request] middleware/log_middleware.go:76 request accepted {"request_id": "66791de1-f896-42ce-bc99-99a2c389afd3", "method": "POST", "URL": "/intake/v2/events", "content_length": 1253, "remote_address": "192.168.49.35", "user-agent": "elasticapm-python/5.5.2", "response_code": 202}
apm-server | 2020-06-07T13:07:30.378Z ERROR pipeline/output.go:100 Failed to connect to backoff(elasticsearch(http://es01:9200)): Connection marked as failed because the onConnect callback failed: resource 'apm-7.5.2-error' exists, but it is not an alias
apm-server | 2020-06-07T13:07:30.378Z INFO pipeline/output.go:93 Attempting to reconnect to backoff(elasticsearch(http://es01:9200)) with 136 reconnect attempt(s)
apm-server | 2020-06-07T13:07:30.379Z INFO [publisher] pipeline/retry.go:196 retryer: send unwait-signal to consumer
apm-server | 2020-06-07T13:07:30.379Z INFO [publisher] pipeline/retry.go:198 done
apm-server | 2020-06-07T13:07:30.379Z INFO [publisher] pipeline/retry.go:173 retryer: send wait signal to consumer
apm-server | 2020-06-07T13:07:30.379Z INFO [publisher] pipeline/retry.go:175 done
apm-server | 2020-06-07T13:07:30.381Z INFO elasticsearch/client.go:753 Attempting to connect to Elasticsearch version 7.5.2
apm-server | 2020-06-07T13:07:30.426Z INFO [pipelines] pipeline/register.go:53 Pipeline already registered: apm
apm-server | 2020-06-07T13:07:30.435Z INFO [pipelines] pipeline/register.go:53 Pipeline already registered: apm_user_agent
apm-server | 2020-06-07T13:07:30.437Z INFO [pipelines] pipeline/register.go:53 Pipeline already registered: apm_user_geo
apm-server | 2020-06-07T13:07:30.437Z INFO [pipelines] pipeline/register.go:56 Registered Ingest Pipelines successfully.
apm-server | 2020-06-07T13:07:30.437Z INFO [index-management] idxmgmt/manager.go:84 Overwrite ILM setup is disabled.
apm-server | 2020-06-07T13:07:30.438Z INFO [index-management] idxmgmt/manager.go:203 Set setup.template.name to 'apm-%{[observer.version]}'.
apm-server | 2020-06-07T13:07:30.438Z INFO [index-management] idxmgmt/manager.go:205 Set setup.template.pattern to 'apm-%{[observer.version]}*'.
apm-server | 2020-06-07T13:07:30.446Z INFO template/load.go:89 Template apm-7.5.2 already exists and will not be overwritten.
apm-server | 2020-06-07T13:07:30.446Z INFO [index-management] idxmgmt/manager.go:211 Finished loading index template.
apm-server | 2020-06-07T13:07:30.449Z INFO [index-management.ilm] ilm/std.go:138 do not generate ilm policy: exists=true, overwrite=false
apm-server | 2020-06-07T13:07:30.449Z INFO [index-management] idxmgmt/manager.go:240 ILM policy apm-rollover-30-days successfully loaded.
apm-server | 2020-06-07T13:07:30.456Z INFO template/load.go:89 Template apm-7.5.2-error already exists and will not be overwritten.
apm-server | 2020-06-07T13:07:30.456Z INFO [index-management] idxmgmt/manager.go:223 Finished template setup for apm-7.5.2-error.
apm-server | 2020-06-07T13:07:39.848Z ERROR [request] middleware/log_middleware.go:74 forbidden request {"request_id": "30938466-b8a9-4e9e-9edc-3074ea2bd872", "method": "POST", "URL": "/config/v1/agents", "content_length": 40, "remote_address": "192.168.49.35", "user-agent": "elasticapm-python/5.5.2", "response_code": 403, "error": "forbidden request: Agent remote configuration is disabled. Configure the `apm-server.kibana` section in apm-server.yml to enable it. If you are using a RUM agent, you also need to configure the `apm-server.rum` section. If you are not using remote configuration, you can safely ignore this error."}
apm-server | 2020-06-07T13:07:48.284Z INFO [request] middleware/log_middleware.go:76 request accepted {"request_id": "7b75214e-6ff4-401a-a2b8-4e0f415cf3a8", "method": "POST", "URL": "/intake/v2/events", "content_length": 556, "remote_address": "192.168.49.35", "user-agent": "elasticapm-python/5.5.2", "response_code": 202}
It should be noted that elasticserach
is fine and I can search my index without any problem:
services_1 | INFO:elasticsearch:GET http://192.168.49.37:9200/cities/_search?ignore_unavailable=true&size=5 [status:200 request:0.008s]
I'm new to elastic and I don't know what to do. Do you have any idea where the problem is? What do you think I should do now?