When I first saw that queue is full in the log, the queue-related value of apmserver was added, but queue is full will still appear in the log. I don't know what causes apmserver to refuse to receive requests.
I just tried to delete all indexes of apm- *, and then apmserver will not prompt "queue is full". Is "queue is full" related to elasticsearch? But I don't see obvious errors on elasticsearch
@simitt
I don't understand why I need to delete the alias? What does this have to do with 'queue is full'?
And if it is a problem caused by the alias, why the alias will conflict?
apm-server will then do bulk writes to what it thinks is a write alias but instead will auto_create_index a real index where the write alias used to be.
The problem you are running into is that the server is writing to an actual index instead of an alias. At some point this leads to an error similar to Connection marked as failed because the onConnect callback failed: resource 'apm-7.6.2-metric' exists, but it is not an alias. When this happens the server cannot write any more events to Elasticsearch and the internal memory queue starts filling up. That's when the APM Server starts responding with queue is full errors.
Unfortunately there is currently no better solution for recovering from this than described in the linked APM Server issue.
So the solution is to delete the actual index name?
If you'd like to retain the collected data you can follow step 2 in the comment on how to recover from this issue. Once the cloning finished, you will indeed need to delete the conflicting index and ensure the APM Server creates a new alias.
Why does this problem occur?
It can for example occur if one manually deletes the indices after the APM Server took care of the setup. Another possibility is that they were manually setup in the first place and not properly linked with ILM.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.