My tcp_test.log file is properly created when I telnet data to my logstash and everything seems to work well.
Things get complicated when I try to send real data. My firewall is configured to send data but nothing appears. A tcpdump shows that data is reaching the logstash server on the right port but my file stays empty. But, when I reload the pipeline, the file is finally created with only one line containing a single humongous JSON document in which all the logs are concatenated into one single very long message. I captured the traffic and noticed that every logs are sent into its own TCP frame without any delimiter (\n, \r\n, \0, ...) at the end of the log.
If I understand correctly how the TCP input plugin works, this explains why nothing appears in my file until my pipeline is stopped: the input waits for a delimiter to come but since there is none it considers the full data coming in as a single log, whatever its size is.
Is there a way to specify to the input that the log should be delimited by the TCP frame only? Or may be I am not using the right codec (i tried the default "line" codec and the plain codec but the later is ignored to be replace by the "line" one, see Logstash - wrong codec in tcp input plugin?)
Hi @Badger and thank you for your answer. I have investigated a little more since the opening of this discussion and I will add some observations about my previous statement.
The devices sending us logs follow the RFC5424 to send syslog messages. This RFC seems to allow formatting the messages as " " without adding delimiters between messages. As an example, sending the following:
some message
some other message
would give the following network capture:
12 some message18 some other message
A rsyslog server with standard configuration receiving those logs through a TCP socket would write each message on its own line in a file.
A solution could be to use a syslog input with the grok_pattern option correctly set (and even this could be tricky).
But (because there is always a but) I will need to cipher data to ensure data confidentiality during the transport and, unfortunately, the syslog input does not seem to support those options.
Do you have some way to make it work without putting a rsyslog in front of the logstash to make the translation?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.