Logstash adds extra new line characters

I am using next configuration file

input {
  stdin {
    codec => multiline {
      pattern => "%{APEX_TIMESTAMP}"
      negate => true
      what => "previous"
    }
  }
}
filter {
  grok {
    patterns_dir => ["/etc/logstash/patterns"]
    match => ["message", "%{APEX_TIMESTAMP:timestamp} %{LOGLEVEL:severity} %{GREEDYDATA:message}"]
  }
  date {
    timezone => "US/Pacific"
    match => ["timestamp" , "yyyy-MM-dd HH:mm:ss,SSS"]
  }
}
output {
      elasticsearch {
      hosts => ["localhost"]
    }
}

where

APEX_TIMESTAMP %{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{HOUR}:%{MINUTE}:%{SECOND},\d{3}

as result I got 13 log messages from ~41,000 with _grokparsefailure. This is one of them

{
  "_index": "logstash-2016.09.16",
  "_type": "logs",
  "_id": "AVcw6Si2vjCqs-IM_UH9",
  "_score": null,
  "_source": {
    "@timestamp": "2016-09-16T02:52:49.123Z",
    "message": "2016-07-27 15:46:30,145 D\nEBUG  [defaultQuartzScheduler_Worker-21   ] [XpsMessageContext:204] (   ) [??????????] createXpsMessageTasks: { Message Task for UPDATE, table ID(34), entity(com.????????????), message key(34:70582107), entity gkey (?????????), mediator (class com.????????)}",
    "@version": "1",
    "tags": [
      "multiline",
      "_grokparsefailure"
    ],
    "host": "localhost.localdomain"

but the original message from log file does not have "\n" in DEBUG

[root@localhost Backup]# egrep '^2016-07-27 15:46:30,145 .*message key\(34:70582107\).*' navis-apex.log |od -c
0000000   2   0   1   6   -   0   7   -   2   7       1   5   :   4   6
0000020   :   3   0   ,   1   4   5       D   E   B   U   G           [
0000040   d   e   f   a   u   l   t   Q   u   a   r   t   z   S   c   h
0000060   e   d   u   l   e   r   _   W   o   r   k   e   r   -   2   1

Where did I make configuration mistake?

Thank you

Hi, can you post a normal plain text example from your log?
Also in your output just stick in the following instead of the Elasticsearch output for now and see if Logstash is fine:

  stdout {
    codec => rubydebug
  }

Sure.

[root@localhost Backup]#  /opt/logstash/bin/logstash -f /root/Backup/first-pipeline.conf < navis-apex.log > out
[root@localhost Backup]# egrep '2016-07-27T22:46:30.145Z' --color -A 10 -B 10 out
    "@timestamp" => "2016-09-16T17:43:06.219Z",
       "message" => "2016-07-27 15:46:30,145 D\nEBUG  [defaultQuartzScheduler_Worker-21   ] [XpsMessageContext:204] (   ) [DPW/DPWA/AGW/AGW] createXpsMessageTasks: { Message Task for UPDATE, table ID(34), entity(com.navis.inventory.business.units.UnitYardVisit), message key(34:70582107), entity gkey (70582107), mediator (class com.navis.apex.business.xps.mediators.XpmContainerArgo)}",
      "@version" => "1",
          "tags" => [
        [0] "multiline",
        [1] "_grokparsefailure"
    ],
          "host" => "localhost.localdomain"
}
{
    "@timestamp" => "2016-07-27T22:46:30.145Z",
       "message" => " [defaultQuartzScheduler_Worker-21   ] [XpsMessageContext:204] (   ) [DPW/DPWA/AGW/AGW] createXpsMessageTasks: { Message Task for UPDATE, table ID(34), entity(com.navis.inventory.business.units.UnitYardVisit), message key(34:70582107), entity gkey (70582107), mediator (class com.navis.apex.business.xps.mediators.XpmContainerArgo)}",
      "@version" => "1",
          "host" => "localhost.localdomain",
     "timestamp" => "2016-07-27 15:46:30,145",
      "severity" => "DEBUG"
}
{
    "@timestamp" => "2016-07-27T22:46:30.153Z",
       "message" => "  [ajp-bio-10009-exec-8               ] [GroovyApi:229] (   ) [DPWAntwerpUnitReroute] GATE_21204 : e=null",
      "@version" => "1",
[root@localhost Backup]# egrep '2016-07-27 15:46:30,145 D' navis-apex.log
2016-07-27 15:46:30,145 DEBUG  [defaultQuartzScheduler_Worker-21   ] [XpsMessageContext:204] (   ) [DPW/DPWA/AGW/AGW] createXpsMessageTasks: { Message Task for UPDATE, table ID(34), entity(com.navis.inventory.business.units.UnitYardVisit), message key(34:70582107), entity gkey (70582107), mediator (class com.navis.apex.business.xps.mediators.XpmContainerArgo)}
2016-07-27 15:46:30,145 DEBUG  [defaultQuartzScheduler_Worker-21   ] [XpsMessageContext:204] (   ) [DPW/DPWA/AGW/AGW] createXpsMessageTasks: { Message Task for UPDATE, table ID(34), entity(com.navis.inventory.business.units.UnitYardVisit), message key(34:70582107), entity gkey (70582107), mediator (class com.navis.apex.business.xps.mediators.XpmContainerArgo)}

@andreys
This is a known problem when the multiline codec is used with stdin input.
The stdin input sends a 16K chunk of text to the multiline codec.
The multiline codec splits the chunk by new line characters but last piece is usually not a full "line" but the multiline codec thinks it is and the rest of the original line (in the next 16K chunk) does not start with APEX_TIMESTAMP and therefore it gets appended to the previous lines with a new line.

There is no real solution to this while still using those two plugins together. You should try to find an alternative input to use.

@guyboertje
Thank you! I installed Beats on remote servers and configure them to pass logs to logstash. Will check if I can find this kind of problems in this setup.

With this bug, which you described, how can I push all old logs to ElasticSearch? I have at least terabyte of data which was collected for last several months.

Use filebeat with the beats input in LS to read each file.

@andreys

For the Navis N4, just creating a new appender in N4 you can drop the logs into log4j without the log files.

Unfortunately, for XPS (service and clients), you have to use the files.

:wink:

cheers.
Jordi