Migrating from "file input" to "beats input" ==> filter/matching problem

Hello,
I need to send my squid (modified) access.log to an Elasticsearch cluster.

I've asked some log files from our system team and I used them as an input into my cluster (File => logstash => Elastic) in order to prepare the dissect/grok filter.

Everything was working as wanted and we decided to push the logs from source servers with filebeat to the logstash server.

It seems that my filter is unable to parse the data correctly when it comes from filebeat and I obtain lots of warnings in the logstash plain-log :

[2019-09-18T16:14:00,816][WARN ][org.logstash.dissect.Dissector] Dissector datatype conversion, value cannot be coerced, field: srv_port, value: 10.170.226.63
[2019-09-18T16:14:00,816][WARN ][org.logstash.dissect.Dissector] Dissector datatype conversion, value cannot be coerced, field: status_code, value: TCP_MISS

My index on the Elastic is now a mess, with fields incorrectly filled by wrong values (that should be in other fields)

Here is an example of the source log file:

Dec 29 17:15:33 dvgospxi1 (squid-1): 29/Dec/2016:17:15:33 +0100 - TCP_DENIED 403 10.117.56.12:42878 3529 GET http://intranoo.francetelecom.fr/ text/html 3428 HIER_NONE - 10.170.226.63 3128 0

And here is my logstash .conf file

input {
  beats {
   port => "1762"
  }
}
filter {
        dissect {
          mapping => {
            "message" => "%{} %{} %{} %{host} %{} %{timestamp->} %{} %{user_id} %{req_status} %{status_code} %{user_ip} %{user_req_size} %{method} %{url} %{mime_type} %{reply_size} %{hierarchy} %{fwd_ip} %{srv_ip} %{srv_port} %{}"
          }
          convert_datatype => {
            "status_code" => "int"
            "user_req_size" => "int"
            "reply_size" => "int"
            "srv_port" => "int"
          }
        }
        grok {
          match => {
            "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp}.*%{NUMBER:duration:int}"
          }
        remove_field => ["message"]
        }

        date {
          match => [ "syslog_timestamp", "MMM dd HH:mm:ss" ]
          timezone => "Europe/Paris"
        }
}
output {
elasticsearch {
    hosts => ["10.118.123.226:1761", "10.118.123.227:1761", "10.118.123.229:1761"]
    index => "squid-%{+YYYY.MM.dd}"
    manage_template => true
    template => "/etc/logstash/conf.d/squid_access_log_mapping.json"
    template_name => "squid_template"
}
}

I tried to remove the filter part and just used a "file output" plugin with this instruction:

codec => line { format => "%{message}"}

And the file output is correct.

But without this instruction in the file output section I obtain something like raw filebeat :

{"agent":{"hostname":"dvgospxi1","ephemeral_id":"b0f9a110-eaac-4d89-b347-325431f6a167","type":"filebeat","id":"faf86c1c-4101-4c28-bae2-ca7c85d1f11b","version":"7.2.0"},"tags":["beats_input_codec_plain_applied"],"@version":"1","@timestamp":"2019-09-18T14:27:15.576Z","log":{"offset":177273604,"file":{"path":"/var/opt/data/flat/squid/log/daily-access.log"}},"message":"Jan 21 20:23:19 dvgospxi1 (squid-1): 21/Jan/2017:20:23:19 +0100 - TCP_DENIED 403 10.117.56.12:58014 3529 GET http://intranoo.francetelecom.fr/ text/html 3428 HIER_NONE - 10.170.226.63 3128  0","input":{"type":"log"},"logtype":"squid_access_log_dev","host":{"hostname":"dvgospxi1","os":{"kernel":"2.6.32-696.3.2.el6.x86_64","codename":"Santiago","platform":"redhat","family":"redhat","version":"6.8 (Santiago)","name":"Red"},"containerized":false,"architecture":"x86_64","name":"dvgospxi1"},"ecs":{"version":"1.0.0"}}
{"input":{"type":"log"},"tags":["beats_input_codec_plain_applied"],"@version":"1","@timestamp":"2019-09-18T14:27:15.576Z","log":{"file":{"path":"/var/opt/data/flat/squid/log/daily-access.log"},"offset":177273796},"message":"Jan 21 20:23:42 dvgospxi1 (squid-1): 21/Jan/2017:20:23:42 +0100 - TCP_MISS 200 10.170.182.250:40019 3397 CONNECT subscription.es.bluecoat.com:443 - 3278 FIRSTUP_PARENT 10.170.226.243 10.170.226.63 3128 301617","host":{"hostname":"dvgospxi1","os":{"kernel":"2.6.32-696.3.2.el6.x86_64","codename":"Santiago","family":"redhat","platform":"redhat","version":"6.8 (Santiago)","name":"Red"},"containerized":false,"architecture":"x86_64","name":"dvgospxi1"},"logtype":"squid_access_log_dev","agent":{"hostname":"dvgospxi1","ephemeral_id":"b0f9a110-eaac-4d89-b347-325431f6a167","type":"filebeat","id":"faf86c1c-4101-4c28-bae2-ca7c85d1f11b","version":"7.2.0"},"ecs":{"version":"1.0.0"}}  

What do I have to change/do in order to obtain my data correctly parsed/indexed on the Elastic side ?

Any help appreciated
Thanks

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.