Currently we have ELK setup as below:
LSF Logstash (indexer) Elasticsearch
Each component is running on individual server hosted in same data center. The LSF is running together with the application server and set to harvest the application logs.
The application logs events are mostly in JSON and XML pretty-print format. So Logstash is configured to use Multiline plugin. Some of these payloads can be pretty huge (eg: 15000 lines), and I am now experiencing issue with these huge payloadas.
I could have LSF shipping transactions that have few hundred lines of payload without any issue. The feeding breaks when LSF needed to ship the 15k lines payload to Logstash, and I observed CircuitBreaker errors rolling on Logstash, along with "Read error looking for ack: EOF" rolling on LSF. Errors will loop infinitely until I restart LSF and Logstash.
I tried changing the spool size of LSF, from as low as 10, to as big as 20000 but none helps. The only time I managed to get the log events successfully processed is by having Logstash read and process the log files directly via the File input plugin.
So it appears that multiline is not causing this performance issue, but LSF+multiline is.
How can I overcome this bottleneck, if possible, without changing the current architecture?