Hi. I would like ideas about what Logstash functionality I can use to troubleshoot a problem I'm experiencing with the http input plugin.
My http input does not seem to receive?/process? the expected number of events that I am (nearly) certain are being sent to that input. Because there are many components in the log-sending chain, I want to determine if Logstash is the problem. I can't tell if Logstash:
- is refusing some network-layer connections to the input
- is too busy to accept input data
- is dropping events due to filter problems
- is unable to send some output
- any combination of the above.
No errors are reported in the Logstash logs (not using debug). The CPU/memory/network performance of the Logstash server hardware seems completely normal.
I'm hoping that Logstash itself can provide some info/metrics which might help me determine whether or not Logstash is the cause of the problem.
I have a third-party web service which sends log events via webhooks. The service is configured to send to 2 destinations simultaneously. For now, I have to assume that the service is sending identical data to the two destinations. Destination1 is a third-party cloud-hosted Elastic stack ServiceX (the webhook data is sent directly). Destination2 is my own AWS-hosted Logstash instance with an http input. The data is enriched by Logstash and then forwarded on to ServiceX. This Logstash instance is also receiving/filtering multiple other kinds of input from other external sources (mostly Filebeat). Regardless of the time period or even data volume, webhook data sent via Destination2 (my self-hosted Logstash) consistently results in only between 1/3 to 1/2 the number of documents compared to the identical webhook data sent directly to ServiceX. I assume that ServiceX utilizes a queue service when they receive data, whereas my own Logstash does not. This is a problem because I much prefer to put only my enriched Destination2 data into Elasticsearch.
Previously, Destination2 had an AWS Network Load Balancer in front of it which accepted TLS connections and then forwarded them to Logstash's non-TLS http input. In an attempt to troubleshoot this issue, I re-configured the NLB to be a simple TCP listener which forwarded connections to Logstash's TLS-configured http input. This actually made the problem worse, with "lost" documents increasing from roughly 1/2 to 2/3. This is what leads me to believe that Logstash is the bottleneck in my chain.
Any help would be most welcome.