Logstash http input plugin accepting gzip file but how to detect only text format?

I am using the below conf for Logstash HTTP input plugin: I am trying to send .gz file and in header passing Content-Encoding: gzip. It is successfully doing its job but I am interested only in text format inside the gzip and not any other format. How to modify that it should accept only those .gz file which contains text format and not any other format like images/movie/pdf etc.

// Configuration
    input {
      http {
        host => "0.0.0.0"
        port => 8443
        max_pending_requests => 500
        ssl => "false"
        ssl_verify_mode => "none"
        threads => "20"
      }
    }

    output {
        file {
            path => "../../logstash-client-logs/%{[headers][application_name]}/myapplication-logstash-client-%{+yyyy-MM-dd}.log"
            codec =>  line { format => "%{[message]}"}
        }
    }

Images are likely to contain characters that text files will not. You might be able to test this using something like

if [message] !~ /^[[[:alnum:]][[:space:]][[:punct:]]]*$/ { drop {} }

That is, if the message contains anything other than alphanumeric characters, punctuation or whitespace, discard it.

Alternatively, check for a file signature and drop anything you do not want to keep.

Hi Badger,

Thanks for checking and responding to it.
Its not working with if [message] !~ /^[[[:alnum:]][[:space:]][[:punct:]]]*$/ { drop {} }
but it's working with the file signature. The only disadvantage I see with the file signature is I have to keep adding all the unwanted file signature manually in the if condition. My requirement is, I don't want anything other than text format in the logs. Do you think is there any workaround to accept only text and reject any format other than text?
Also instead of dropping the message, is there any way to notify the client with an error message like 415 Unsupported Media Type.

// Configuration
    input {
      http {
        host => "0.0.0.0"
        port => 8443
        max_pending_requests => 500
        ssl => "false"
        ssl_verify_mode => "none"
      }
    }

    filter {
    # if [message] !~ /^[[[:alnum:]][[:space:]][[:punct:]]]*$/ { drop {} }
    if "PNG" in [message] or "%PDF-" in [message] { drop { } }
    }

    output {
        file {
            path => "../../logstash-client-logs/%{[headers][application_name]}/myapplication-logstash-client-%{+yyyy-MM-dd}.log"
            codec =>  line { format => "%{[message]}"}
        }

    }

Regards,
Agniv

No, you cannot return an error to the client.

That said, if you can write a codec (and codecs can be quite simple) you might be able to use the additional_codecs option on the http input to run your codec for text/plain, tag the event in the codec, then drop everything that is not tagged. I am not certain it would work, but I think it is possible.

For files below worked, instead of beginning and ending of lines, checking for only beginning and end of the text
if [message] !~ /\A[[[:alnum:]][[:space:]][[:punct:]]]*\z/ { drop { } }

Reference: https://www.elastic.co/guide/en/beats/filebeat/current/regexp-support.html

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.