Logstash trims part of log line

I have some problem with log files processing by Logstash (I think).

Sometimes we have "_grokparsefailure" tagged documents.
Here is an example:

Note, there is no functionality to remove first part of the message. Yet, first big part of it is missing. In log file such line looks like the following:

2019-03-05 17:34:54.653 prod Azure L_______________n Info 1 L_______________n logged lead [string:messageType:"leadLogged"] [string:leadChannel:"MarketPlace"] [string:leadSource:"p________________e"] [string:leadType:"Phone"]

  • Please, disregard underscores. There I've replaced some actual data.

I've checked log files and nowhere I could find a line starting with "s" as it is shown in the screenshot document.

Wondering what is wrong with Logstash and How could I fix it if it is possible at all?

  • Over last 7 days we have only 2 such cases. So the error is rare, yet persistent.

What does the input in your configuration look like?

Badger, sure. Here is the pipeline config for Logstash:

input {
    stdin {}
    beats { port => "5044" }
    file { path => [
                "/tdr/logs/*.log",
                "/tdr/logs/*/*.log",
                "/tdr/logs/*/*/*.log"
            ]
        }
}
filter {
ruby {
    path => "/etc/logstash/conf.d/fields_extractor.rb"
}
grok {
    match => { "message" => "^(?<a_timestamp_raw>[\d|-]+\s[\d|:|\.]+) (?<a_environment>[^\s]+) (?<a_host>[^\s]+) (?<a_app>[^\s]+) (?<a_level>[^\s]+) (?<a_loggerVersion>[^\s]+).*$" }
}
date {
    match => [ "a_timestamp_raw", "yyyy-MM-dd HH:mm:ss.SSS" ]
    timezone => "UTC"
    target => "@timestamp"
}
}

filter{
mutate {
    copy => { "a_environment" => "a_environment_lower" }
    copy => { "a_app" => "a_app_lower" }
}
}

filter{
mutate {
    lowercase => [ "a_environment_lower", "a_app_lower" ]
    remove_field => [ "a_timestamp_raw" ]
}
}
output {
    elasticsearch {
    hosts => ["localhost:9200"]
    index => "logstash-%{a_environment_lower}-%{a_app_lower}-%{+YYYY.MM.dd}"
    }
    stdout { codec => rubydebug }
}

Here is referenced fields extractor ruby script:

# Constructor?
def register(params)
    #@drop_percentage = params["percentage"]
end

def filter(event)
    # to access param: @drop_percentage

    _message = event.get("message")
    _customFields = _message.scan(/\[\w*\:[\w\d]*:\"(?:[^\"]|\\.)*\"\]/)

    _customFields.each { |match|
        begin
            _typeNameValueString = match
            #cut out matched square brackets
            _typeNameValueString = _typeNameValueString[1...-1]

            _typeNameRestArray = _typeNameValueString.split(':')

            _fieldType = _typeNameRestArray[0]
            _fieldTypeLowered = _fieldType.downcase
            _fieldName = _typeNameRestArray[1]
            _typeNameRestArray.shift
            _typeNameRestArray.shift
            _valueArray = _typeNameRestArray
            # get the value without quotes
            _value = _valueArray.join('')[1...-1]

            case _fieldTypeLowered
            when "string"
                #fix escaped quotes
                _customFields = _value.scan(/\\\"/)
                _customFields.each { |match| _value[match] = '"' }

                event.set(_fieldName, _value)
            when "bool"
                event.set(_fieldName, _value.downcase == "true")
            when "number"
                if _value.index(',') != nil
                    _value[","] = "."
                end

                if _value.index('.') != nil
                    event.set(_fieldName, _value.to_f)
                else
                    event.set(_fieldName, _value.to_i)
                end
            else
                event.set("error_cant_process_field", "Unknown type: " + _fieldType)
            end
        rescue
            event.set("error_cant_process_field", "Cought an exception processing " + match)
        end
    }

    return [event]
end

Please, let me know if you need any other details.
Regards,

I cannot think of any way that could happen.

Those log files are filled with data sometime number of times per second.
While file is literally just updated, it might be updated within milliseconds after that.

Might such high write activity on log files cause some sort of concurrency issue?

No, I would not expect write volume to be an issue.

Ok. Then can I configure logstash in some way to allow for obtaining any more details, which would help you identify root cause of the issue?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.