Logstash Multiline and line codec differences

Hello All,

We have application logs coming in from a number of different hosts (shipped with filebeat) and have obvserved a mixing of datastreams for one of the log types.

We changed the logstash input.config from version A to B which appears to have resolved the issue. We are using ELK 7.10.

A)
input {
beats {
client_inactivity_timeout => 1200
port => 5044
codec => line {
charset => "ISO-8859-1"
}
}
}

B)
input {
beats {
port => 5044
}
}

Below is our filebeat config for reference

input_type: log
paths:

  • C:\bla\bla.*.exe.log
    fields: {log_type: App_log_files}
    fields_under_root: true
    multiline.pattern: '^(20[0-9]{2}(-[0-9]{2}){2} [0-9]{2}(:[0-9]{2}){2})|([0-9]{2} [JFMASOND][a-z]{2} 20[0-9]{2})'
    multiline.negate: true
    multiline.match: after
    close_removed: false
    close_eof: false
    clean_removed: true
    scan_frequency: 1h
    backoff: 5m

I have read using multiline codec can cause this (mentioned in the KB below) but nothing is mentioned about this in the line code KB.

Is it also possible that the line code can cause this issue?

Also, can anyone suggest an improvement on version B for our input?

What kind of mixing? Can you share some examples?

If you are using filebeat to send logs to Logstash you should not use the codec you used in your input A. Filebeat will send logs as json using UTF-8.

What kind of improvement? The beats input is pretty simple, normally you just set the host and the port.

thanks for the prompt reply.

Sure, we have 2 different application log file types coming into separate daily logstash indices, one for server and one for the various services. Each with their own input entry in filebeat.yml and separate .conf filter file.

We observed from Kibana log lines from the server log consistently appearing in one of the service log fields though this stopped after making the changes to the logstash input conf (version B).

There is no crossover between the two different log types or shared processes etc so it was odd seeing server log lines appear in the service logs in kibana.

Ok, so if we are sending multiline application logs the multiline aspect of this should be configured in filebeat.yml and we don´t need any additional conf for the logstash input other than what is already in version B?

I guess a better question would have been, is the config we are currently running best practice? is there are recommended logstash input codec for setup such as this.

Also, will we be able to fix (remove the server log lines from the service log indices) by just reindexing the previous daily log indices.

You would need to share your entire filebeat.yml file and also your entire conf file to help understand what happened, if you do, please use the preformatted text option, the </> button.

It is pretty hard to understand configuration when it is not properly formatted.

But per default, different conf files in logstash does not mean that they are independent from each other, if you do not configure multiple pipelines, then all the conf files inside /etc/logstash/conf.d will be merged as one configuration.

If you have multiline logs and are using filebeat, then the configuration should be done in Filebeat side, not Logstash.

The beats inputs is pretty simple, in most of the cases you just need the port and that's it.

It depends on your case, you didn't provide enough context, the beats always send data using json and with the UTF-8 codec, so there is no need to change the charset, if you want you can use the json codec in the input to parse the data directly in the input, but you can also do the same thing using a json filter in the filter block.

I prefer to leave the codec as the default, which is the plain codec, and do the parsing in the filter block.

Hopefully this is easier on the eyes, this is our filebeat.yml from the host servers.

filebeat.registry.path: ./filebeat

logging.to_files: true
logging.files:
 path: C:\Data\Logs\Monitoring
 name: filebeat.log
 rotateeverybytes: 10485760
 keepfiles: 2
logging.level: info

filebeat.spool_size: 102400
filebeat.idle_timeout: 15s

filebeat.inputs:

- input_type: log
  paths:
  - C:\Data\Logs\Company.Application.Server.exe.log
  fields: {log_type: Server_log_files}
  fields_under_root: true
  multiline.pattern: '^(20[0-9]{2}(-[0-9]{2}){2} [0-9]{2}(:[0-9]{2}){2})|([0-9]{2} [JFMASOND][a-z]{2} 20[0-9]{2})'
  multiline.negate: true
  multiline.match: after
  close_removed: false
  close_eof: false
  clean_removed: true
  scan_frequency: 1h
  backoff: 5m

- input_type: log
  paths:
  - C:\Data\Logs\Company.Application.DataImport.*.exe.log
  fields: {log_type: Ovalis_DI_log_files}
  fields_under_root: true
  multiline.pattern: '^(20[0-9]{2}(-[0-9]{2}){2} [0-9]{2}(:[0-9]{2}){2})|([0-9]{2} [JFMASOND][a-z]{2} 20[0-9]{2})'
  multiline.negate: true
  multiline.match: after
  close_removed: false
  close_eof: false
  clean_removed: true
  scan_frequency: 1h
  backoff: 5m

output.logstash:
  hosts: ["Monitoring.server.com:5044"]
  bulk_max_size: 8192
  worker: 4
  compression_level: 0
  pipelining: 5

Can you explain why multiple pipelines is better than having just a main/default pipeline? Is it for redundancy, performance of both?

I am also a little confused how to set this up, is the setup below correct?

pipelines.conf

- pipeline.id: data-import-pipeline
  path.config: "C:\logstash\conf.d\DI_log_files.conf"
- pipeline.id: ovalis-server-pipeline
  path.config: "C:\logstash\conf.d\Server_log_files.conf"

input.conf

input {
  beats {
        port => 5044
            }
        }

output.conf

output }


 if [log_type] == "Server_log_files" {

  if "elastic" in [tags] {
    elasticsearch {
                hosts => ["localhost:9200"]
                index => ["logstash-srv-%{+YYYY.MM.dd}"]	
				document_id => "%{fingerprint}"
                 }
      }

        stdout {codec =>rubydebug}

 }

 if [log_type] == "DI_log_files" {

  if "elastic" in [tags] {
    elasticsearch {
                hosts => ["localhost:9200"]
				index => ["logstash-di-%{+YYYY.MM.dd}"]
				document_id => "%{fingerprint}"
                 }
      }


       stdout {codec =>rubydebug}

 }

}

when you say its default do I have to define codec => plain (or similar) in the input.conf or this doesn´t just setting defining beats and the port will cover this.

Thanks in advance

Multiple pipeline are used when you don't to avoid the risk of the data from twow or more different inputs to mix up as using multiple pipelines will run each pipeline independently from each other.

If you do not use multiple pipelines Logstash will merge all the configuration files in the configuration path.

But in your case this is not the issue, your Filebeat is configured to send logs to Logstash on port 5044, and you can have only one input listening on this port.

Also, you have conditionals in your configuration, I don't see how your logs would be mixed in the output.

You are correctly adding a field for each input in filebeat and using this field to filter the logs in your Logstash, this is already the right approach.

Are you still experiencing mixed logs?

For the beats input you just need to set the port, you do not need to set the codec as plain because this is already the default codec.

You will however need a json filter in your logstash pipeline if you do not set the beats codec to json.

Apologies for the delay in my reply.

Just to clarify, our issue was resolved after making the changes to our logstash input.conf (removing line codec and charset). So all good on that front.

Is it only possible to have multiple logstash pipelines if logstash is pulling the files statically from disk/network share?

How would I apply this to our current setup and or is it necessary ? It looks as though the logs are coming as hoped

Do I just have to adjust output.conf

I am curious about the different logstash logs for logstash itself, if I want to see any errors or just how things are going in general when each log is read/ingested/sent to elasticsearch how do I see this?

Looking through logstash-plain only shows me initial logstashs initial connection to elasticsearch and thats about it.

Also thanks again

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.