Hello Everyone
While I'm trying to deep dive into ELK, I followed a video course that is unfortunately outdated...
My ELK version is 8.3.3
However, the lab goal is this:
Use filebeat to parse some apache access logs, send it to Logstash ...use Logstash to learn everything about filtering+ECS and from there ship it to Elasticsearch.
Then, use filebeats default dashboards in Kibana for some basic visualizations of the apache access logs.
Doing so WITHOUT logstash, works fine....but with Logstash, I get a couple of sharding errors when clicking on the default dashbord.
Error Message -> Title:
1 of 3 shards failed
The data you are seeing might be incomplete or wrong.
Error Message Reason:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 2,
"skipped": 2,
"failed": 1,
"failures": [
{
"shard": 0,
"index": "filebeat-8.3.3-2017-09-20",
"node": "eToI6phpReyc43TQRqyOpg",
"reason": {
"type": "illegal_argument_exception",
"reason": "**Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [user_agent.name] in order to load field data by uninverting the inverted index.** Note that this can use significant memory."
}
}
]
},
"hits": {
"total": 0,
"max_score": 0,
"hits": []
}
}
There are multiple pop-ups which complain all about the same problem just for dfferent fields.
What I did:
filebeat setup, yes including all the steps from the manual (multiple times).
Except (because it wont work):
filebeat setup --pipelines
throws:
Exiting: module apache is configured but has no enabled filesets
and I did not use the metadata condition to use the ingest pipeline based on this metadata:
output {
if [@metadata][pipeline] {
elasticsearch {
hosts => "https://061ab24010a2482e9d64729fdb0fd93a.us-east-1.aws.found.io:9243"
manage_template => false
index => "%{[@metadata][beat]}-%{[@metadata][version]}"
action => "create"
pipeline => "%{[@metadata][pipeline]}"
user => "elastic"
password => "secret"
}
} else {
elasticsearch {
hosts => "https://061ab24010a2482e9d64729fdb0fd93a.us-east-1.aws.found.io:9243"
manage_template => false
index => "%{[@metadata][beat]}-%{[@metadata][version]}"
action => "create"
user => "elastic"
password => "secret"
}
}
}
Deleting and re-creating (automatically) the indices.
What I guess:
I guess the mapping of the Filebeat ingest pipeline and the dashboard fields do not match when uing my Logstash filter (which is based on an outdated video course...).
As far as I understand Filebeat on its own set everything up...even the Index and mapping part...but Logstash ship the data with different datatypes (like text only).
Question:
Is there a way to fix it??? Perhaps a working Logstash filter?
input {
beats {
port => 5044
host => "0.0.0.0"
}
}
filter {
if [event][dataset] != "apache.access" {
drop { }
}
grok {
match => { "[event][original]" => '%{HTTPD_COMBINEDLOG}' }
}
date {
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
}
grok {
"match" => {
"[source][address]" => "^(%{IP:[source][ip]}|%{HOSTNAME:[source][domain]})"
}
}
#if "_grokparsefailure" in [tags] {
# drop { }
#}
mutate {
remove_field => [ "log", "input", "service", "host", "ecs", "@version"]
}
mutate {
add_field => { "[event][created]" => "%{@timestamp}" }
}
useragent {
source => "[user_agent][original]"
target => "[user_agent]"
}
geoip {
source => "[source][ip]"
target => "[source][geo]"
}
}
output {
elasticsearch {
hosts => "localhost"
manage_template => false
index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY-MM-dd}"
}
stdout {
codec => rubydebug {
metadata => false
}
}
}