I'm trying to parse some very large input files (>10,000 lines) with Logstash, pass them through the CSV, ruby, and mutate filters, and send them out via HTTP. My config is something like this:
Logstash starts parsing the first files for both systems and pops out a few thousand events via HTTP. Then it stops. The sincedbs are not created, and Logstash doesn't move on (I've waited >30m for it to move on, and it doesn't seem to pop out a single additional event; the first few thousand events take only a minute or two to handle). The log (at "trace" level) ends with a bunch of "output received" lines followed by "Pushing flush onto pipeline"; there seems to be no error message.
I'll see if I can get the thread dump. In the meantime, I did a couple changes to test things. It now reads the files using a Filebeats instance on the same server, and dropped my filters except a match filter (to ignore comments in the log file), a CSV filter, and a date filter. Now, Logstash no longer says pushing flush onto pipeline; it just ends with an output received line. Filebeat is reporting I/O timeouts. The top threads are Runner, worker0, and worker1, with the latter two's call stacks ending at http.rb:141.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.