Apache Access Logs with Filebeats->Logstash->EL<-Kibana: Throws errors at default dashboard

Hello Everyone

While I'm trying to deep dive into ELK, I followed a video course that is unfortunately outdated...

My ELK version is 8.3.3

However, the lab goal is this:
Use filebeat to parse some apache access logs, send it to Logstash ...use Logstash to learn everything about filtering+ECS and from there ship it to Elasticsearch.
Then, use filebeats default dashboards in Kibana for some basic visualizations of the apache access logs.

Doing so WITHOUT logstash, works fine....but with Logstash, I get a couple of sharding errors when clicking on the default dashbord.

Error Message -> Title:
1 of 3 shards failed
The data you are seeing might be incomplete or wrong.

Error Message Reason:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 2,
    "skipped": 2,
    "failed": 1,
    "failures": [
      {
        "shard": 0,
        "index": "filebeat-8.3.3-2017-09-20",
        "node": "eToI6phpReyc43TQRqyOpg",
        "reason": {
          "type": "illegal_argument_exception",
          "reason": "**Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [user_agent.name] in order to load field data by uninverting the inverted index.** Note that this can use significant memory."
        }
      }
    ]
  },
  "hits": {
    "total": 0,
    "max_score": 0,
    "hits": []
  }
}

There are multiple pop-ups which complain all about the same problem just for dfferent fields.

What I did:
filebeat setup, yes including all the steps from the manual (multiple times).
Except (because it wont work):

filebeat setup --pipelines 
throws:
Exiting: module apache is configured but has no enabled filesets

and I did not use the metadata condition to use the ingest pipeline based on this metadata:

output {
  if [@metadata][pipeline] {
    elasticsearch {
      hosts => "https://061ab24010a2482e9d64729fdb0fd93a.us-east-1.aws.found.io:9243"
      manage_template => false
      index => "%{[@metadata][beat]}-%{[@metadata][version]}" 
      action => "create" 
      pipeline => "%{[@metadata][pipeline]}" 
      user => "elastic"
      password => "secret"
    }
  } else {
    elasticsearch {
      hosts => "https://061ab24010a2482e9d64729fdb0fd93a.us-east-1.aws.found.io:9243"
      manage_template => false
      index => "%{[@metadata][beat]}-%{[@metadata][version]}" 
      action => "create"
      user => "elastic"
      password => "secret"
    }
  }
}

Deleting and re-creating (automatically) the indices.

What I guess:
I guess the mapping of the Filebeat ingest pipeline and the dashboard fields do not match when uing my Logstash filter (which is based on an outdated video course...).
As far as I understand Filebeat on its own set everything up...even the Index and mapping part...but Logstash ship the data with different datatypes (like text only).

Question:
Is there a way to fix it??? Perhaps a working Logstash filter?

    input {
    beats {
        port => 5044
        host => "0.0.0.0"
    }
}

filter {
    if [event][dataset] != "apache.access" {
        drop { }
    }

    grok {
        match => { "[event][original]" => '%{HTTPD_COMBINEDLOG}' }
    }

    date {
        match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
    }

    grok {
        "match" => {
            "[source][address]" => "^(%{IP:[source][ip]}|%{HOSTNAME:[source][domain]})"
        }
    }

    #if "_grokparsefailure" in [tags] {
    #    drop { }
    #}

    mutate {
        remove_field => [ "log", "input", "service", "host", "ecs", "@version"]
    }

    mutate {
        add_field => { "[event][created]" => "%{@timestamp}" }
    }

    useragent {
        source => "[user_agent][original]"
        target => "[user_agent]"
    }

    geoip {
        source => "[source][ip]"
        target => "[source][geo]"
    }
}

output {
    elasticsearch {
        hosts => "localhost"
        manage_template => false
        index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY-MM-dd}"
    }

    stdout {
        codec => rubydebug {
            metadata => false
        }
    }
}

Update:
I've have this issue now with the filebeat system module too, which send its stream to logstash.

The pipeline configuration is plain simple:

input {
    beats {
        port => 5044
        host => "0.0.0.0"
    }
}

output {
    if [@metadata][pipeline] {
        elasticsearch {
            hosts => "localhost:9200"
            manage_template => false
            index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY-MM-dd}"
            action => "create"
            pipeline => "%{[@metadata][pipeline]}"
        }

        stdout {
            codec => rubydebug {
                metadata => true
         
            }
        }
    }
}

And again, the default dashboards thow errors:

"reason": {
          "type": "illegal_argument_exception",
          "reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [host.hostname] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
        }

What must I do, to fix this issue? I doubt that the solution is to edit the default dashboards.
Shouldn't the filebeat pipeline be like "fire and forget" ?
The mappings of the filebeat index for the hostname looks like this:

  "hostname": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }

The easiest fix is don't use logstash. Can you just let filebeat send direct to elasticsearch?

If not, what I've done is setup a "sham" filebeat that appears to send to elasticsearch (I just use my work station). It's only purpose is to create the template, data streams and Kibana dashboards. You will need to do this for each version of beats you're going to use.

Then your logstash logic to call the appropriate ingest pipeline should work. In a brief read, it looks similar to mine.

Thanks for your quick response @ruegenl

I'm not sure what you mean with a "sham" filebeat.

It is a lab environment and it is mandatory to use filebeat AND logstash for various learning reasons. On the VM is the whole ELK stack.

Without logstash, the dashboards work as intended. When I edit the various dashboard widgets by hand and choose the "right" field like this
host.name --> host.name.keyword ... the error message for this particular field is gone.

However...I highly assume that this is a bug. Even the manual (correct me if I'm wrong) does not mention anything in this regard.

I found a chinese blog that had this problem too but this guy had his own indice. Perhaps it explains the topic a bit better (I used a translator plugin in my browser to translate it into english).

Update to the ELK manual:

However...the manual explains the problem, but nowhere do I find a quick workaround for the dozens of default dashboard.

Another Update:
winlogbeat had this issue in ELK 7.10.x ...looks like a bug to me:

Yep, a bug --> see this.

Will test the recommended solution and will give my feedback

@SirStephanikus

Nope that is not your issue in 8.3.3

I just ran setup with

setup.ilm.check_exists: false

./filebeat setup -e

and the mappings are correct

GET _cat/indices?v
health status index                                uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   .ds-filebeat-8.3.3-2022.09.06-000001 u3YyG9i_Sh2XduYtAaVloQ   1   1          0            0       225b           225b

GET .ds-filebeat-8.3.3-2022.09.06-000001/
{
  ".ds-filebeat-8.3.3-2022.09.06-000001": {
    "aliases": {},
    "mappings": {
      "_meta": {
        "beat": "filebeat",
        "version": "8.3.3"
      },
      "_data_stream_timestamp": {
        "enabled": true
      },
........

        "host": {
          "properties": {
            "architecture": {
              "type": "keyword",
              "ignore_above": 1024
            },
....
            "hostname": {
              "type": "keyword",  <!----- This is Correct
              "ignore_above": 1024
            },
            "id": {
              "type": "keyword",
              "ignore_above": 1024
            },

Generally this mean Filebeat to process and Logstash to collect and forward as a passthrough.

If you want this to work you need to use this form of logastash pipeline see here...

Filebeat(Module) -> Logstash (Collect / Forward) -> Elasticsearch is very common and works great...

Because you are not properly calling the ingest pipeline the data fields are not being properly set.

input {
  beats {
    port => 5044
  }
}

output {
  if [@metadata][pipeline] {
    elasticsearch {
      hosts => "https://061ab24010a2482e9d64729fdb0fd93a.us-east-1.aws.found.io:9243"
      manage_template => false
      index => "%{[@metadata][beat]}-%{[@metadata][version]}" 
      action => "create" 
      pipeline => "%{[@metadata][pipeline]}" 
      user => "elastic"
      password => "secret"
    }
  } else {
    elasticsearch {
      hosts => "https://061ab24010a2482e9d64729fdb0fd93a.us-east-1.aws.found.io:9243"
      manage_template => false
      index => "%{[@metadata][beat]}-%{[@metadata][version]}" 
      action => "create"
      user => "elastic"
      password => "secret"
    }
  }
}

Glad you are digging in deep but you are missing basic concepts of correct mappings, writing data, setup, logstash as a passthrough, ingest pipeline...

Of course you can do all the parsing in logstasgh yourself but you would need to duplicate ALL the logic of the ingest pipeline and make sure all the proper field are set if you want the dashboard to work...

I just made

GET _ingest/pipeline/filebeat-8.3.3-apache-access-pipeline

Filebeat (Apache Module) -> Logstash(passthroug) -> Elasticsearch work with the default dashboards no problem... works great

If you want to do that it is pretty easy

  • Clean Up everything
  • Point filebeat at elasticsearch and run setup
  • Then Create a logstash conf that I referenced above working with modules.
  • Start logstash
  • Point Filebeat to Logstash
  • Start Filebeat
  • Go Look at Dashboards

I wrote about this many times..

Just make sure you use the correct logstash conf above.

Hi stephenb

I'm not sure perhaps I missed something.
My latest logstash config looks like yours and I really played around to see the @metadata field and and than to use this field for a condition:

input {
    beats {
        port => 5044
        host => "0.0.0.0"
    }
}

output {
    if [@metadata][pipeline] {
        elasticsearch {
            hosts => "localhost:9200"
            manage_template => false
            index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY-MM-dd}"
            action => "create"
            pipeline => "%{[@metadata][pipeline]}"
        }

        stdout {
            codec => rubydebug {
                metadata => true
         
            }
        }
    }
}

Nope... the are not the same... details matter...

Correct is without the -%{+YYYY-MM-dd} when you do that ... you are not writing to the correct data stream... and thus it does not work! Data Streams do not have the data component in their name it is handled behind the scenes.

    elasticsearch {
      hosts => "http://localhost:9200"
      manage_template => false
      index => "%{[@metadata][beat]}-%{[@metadata][version]}"  <!-- No -%{+YYYY-MM-dd}
      action => "create"
      pipeline => "%{[@metadata][pipeline]}" 
      user => "elastic"
      password => "secret"
1 Like

Hi stephenb

Indeed...that works :face_with_monocle: :smiley_cat:

I'm curious the video course I did, the author added the timestamp to the index name explicitly (but for an older ELK version)...but he never mentioned the [@metadata][pipeline] field either. And yes, he wrote every filebeat module option to logstash. I've no intention to do the fingerpointing to the course.

I'm more than grateful and glad that you pushed me to this little detail with enormous impact
:+1: :vulcan_salute: :+1:

1 Like

That is why I rarely recommend 3rd party videos / training ... as they often fall behind ... elastic changes rapidly.

Glad you go it working!