Apm-server disconnections after upgrading to 7.10

moixcruz · December 4, 2020, 12:40pm

Kibana version: 7.5.1

Elasticsearch version: 7.5.1
APM Server version: 7.10.0
APM Agent language and version: java and rum
Original install method (e.g. download page, yum, deb, from source, etc.) and version: yum
Fresh install or upgraded from other version? apm upgraded from 7.5.1 to 7.10.0
Is there anything special in your setup? we use logstash in between apm-servers and elasticsearch for cache and pipelines such as user agent or geolocation
Description of the problem including expected versus actual behavior.:
I'm upgrading the apm stack from 7.5.1 to latest 7.10.0. Recently I upgraded first the apm-server(s) and logstash(s), next iteration we will upgrade elasticsearch and kibana.
Everything look continue working fine but since upgrade I see that apm connections on :8200 are dropping frequently, see screenshot with behavior before/after the upgrade (upgrade done on 8:00 of 2 Dec, a few hours later connections are reset every few minutes):

Screenshot from 2020-12-04 13-29-161280×761 251 KB

Rest is working fine and I don't see errors indicating something bad, Is there anything I could check to see what is causing it? is it normal continues reset of connections since recent versions maybe?

Thanks in advance!

simitt · December 4, 2020, 1:43pm

Hi @moixcruz,
the change in behavior is not expected for recent APM Server version upgrades. Could you maybe provide some more details:

which agent versions are you using and have you also updated them around the same time; if yes from which versions?
can the behavior be observed for connections from the java and the RUM agent?
do you see any errors or logs in the agents indicating any issues?

moixcruz · December 4, 2020, 8:54pm

Thanks @simitt for your fast response

we have a wide variety of agents connected, java (I see versions from 1.9.0 to 1.16.0 connected), dotnet (1.5.1) and js/rum as well (4.4.4, 4.9.1 and 5.0.0). I maintain the backend of elastic apm and other teams are responsible of agents in product, so unfortunately this is something I cannot control.
I don't see anything wrong from agents, checked connections and logs
I can't see anything weird in agents. As said I have no access to them but I've asked for logs of some random agents and cannot see anything weird on them indicating recurrent disconnections

simitt · December 9, 2020, 8:42am

Can you please also check the APM Server logs? In case some issue exists between APM Server and logstash, the internal memory queue might fill up resulting in a 503 response from the APM Server. In this case an error is immediately returned to the agents and a new connection would be created. Although I would expect this to also show up in agent logs, might be worth checking.

moixcruz · December 22, 2020, 11:26am

Hi @simitt sorry for not answering before, I had some days off without access to the servers.
I can't find 503 in logs, however I can now see that the behavior of the tcp connections is now better since some days see last 30 days where we see it fixed gradually several days after I opened this thread:

Could be caused by a network issue but very weird that it started happening exactly right after the upgrade

I will try asking network guys if there were something that could explain it, in any case thanks a lot for your help. I'll comment back with my findings

Topic		Replies	Views
APM stop working after upgrade from 7.17.6 --> 8.4.2 APM server	1	817	September 26, 2022
Peridiocally Java APM Agent experiences errors with connection to APM server APM java	6	990	July 13, 2023
APM agent not able to connect apm server APM	1	1178	January 13, 2020
Unable to reach APM Server: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)) APM	11	6860	February 12, 2019
Trouble "reconnecting" APM after upgrade to 6.6.0 APM	4	786	February 8, 2019

Apm-server disconnections after upgrading to 7.10

Related topics