I have written a client program which connects to logstash server instance on port 5102 and sends log data to it which contains multiple lines of data. I wanted to store all the log data having same trace name in its own file. But in some cases I am seeing that the log data is written to a file named "%{trace_name}.txt" and on debugging further I found that due to grok parse failure. It looks like the data received from TCP socket in input plugin is not processing log data line by line. Whenever the log data received is terminated with a "\n", the grok filter is able to parse the log message successfully, but it fails if the message is truncated.
Can someone suggest what configuration needs to be used so that log data received from TCP socket is processed by the grok filter one line at a time.
Thanks @Rios . I checked the grok pattern in debugger and found that the messages are having 2 different formats. One format has all the fields as defined in the grok pattern above, but other has "count" field missing. Grok parsing is working ok when the message contains all the fields. I think I need to make "count" field optional in the pattern. Can you tell how to make this field optional ? I tried something like below, but it didnt work.
match => {"message" => "%{SYSLOGTIMESTAMP:time} %{DATA:trace_name} %{DATA:node_name} (%{DATA:count})? %{DATA:thread_id} %{GREEDYDATA:data}"}
You welcome.
Just keep consistent, use either regex syntax \s+ or LS syntax %{SPACE} in the grok matching. I intentionally mix both to see that is possible.
Summary:
" " - a single static space, must be separated by only one space character.
\s* - zero or more white spaces. That is: \s matches a space, a tab, a carriage return, a line feed, or a form feed.
\s+ - one or more white spaces
%{SPACE} - same as \s*, as is mentioned in grok patterns
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.