Help with reducing mapping

Hi All.

I'm new to the ELK stack, and am hoping I'm not trying to re-invent the wheel. Steep learning curve at the moment and am just after some help and guidance really.

I'm using FileBeat to send logs out from our Kubernetes cluster. I was originally outputting the logs directly into elasticsearch, but found that having everything outputting into one single index wasnt really working how we wanted.

After some research and the following post which I found really usfull I now send the logs through filebeat rather than elasticsearch, this is working nicely.

But.... The mapping for the indexes in elasticsearch are way more heavyweight than what we need.

Ive followed this guide around changing a mapping, but when I repoint the alias using the "atomic" step, although the mapping that Ive created is correct, none of the documents from the original index are present in the new index with simpler map.

When I use the re-index api to move documents in, the mapping in the new index is ignored and all the original indexes are copying over.

Basically all I really need to achieve is that when we use the "Discover link in Kibana, we dont need to see all the kubernetes fieds and other metadata, we are just wanting to see info about the source of the request, and the "message" fields from the mapping.

I belive that what I'm trying do may be achieveable quite simply by using a logstash filter. My logstash.conf looks like this

input {
    beats {
        port => 5044
    }
}

filter {
  grok {
    match => { "message" => "%{COMBINEDAPACHELOG}" }
  }

  date {
    match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
  }
}

output {
    elasticsearch {
        hosts => "elasticsearch:9200"
        manage_template => false
        index => "%{[kubernetes][namespace]}"
        document_type => "%{[@metadata][type]}"
    }
}

But the mappings of the resulting indexes in eleasticsearch are still enormous like:

{
        "mappings": {
            "properties": {
                "@timestamp": {
                    "type": "date"
                },
                "@version": {
                    "fields": {
                        "keyword": {
                            "ignore_above": 256,
                            "type": "keyword"
                        }
                    },
                    "type": "text"
                },
                "agent": {
                    "properties": {
                        "ephemeral_id": {
                            "fields": {
                                "keyword": {
                                    "ignore_above": 256,
                                    "type": "keyword"
                                }
                            },
                            "type": "text"
                        },
                        "hostname": {
                            "fields": {
                                "keyword": {
                                    "ignore_above": 256,
                                    "type": "keyword"
                                }
                            },
                            "type": "text"
                        },
                        "id": {
                            "fields": {
                                "keyword": {
                                    "ignore_above": 256,
                                    "type": "keyword"
                                }
                            },
                            "type": "text"
                        },
                        "name": {
                            "fields": {
                                "keyword": {
                                    "ignore_above": 256,
                                    "type": "keyword"
                                }
                            },
                            "type": "text"
                        },
                        "type": {
                            "fields": {
                                "keyword": {
                                    "ignore_above": 256,
                                    "type": "keyword"
                                }
                            },
                            "type": "text"
                        },
                        "version": {
                            "fields": {
                                "keyword": {
                                    "ignore_above": 256,
                                    "type": "keyword"
                                }
                            },
                            "type": "text"
                        }
                    }
                },
                "cloud": {
                    "properties": {
                        "account": {
                            "properties": {
                                "id": {
                                    "fields": {
                                        "keyword": {
                                            "ignore_above": 256,
                                            "type": "keyword"
                                        }
                                    },
                                    "type": "text"
                                }
                            }
                        },
                        "availability_zone": {
                            "fields": {
                                "keyword": {
                                    "ignore_above": 256,
                                    "type": "keyword"
                                }
                            },
                            "type": "text"
                        },
                        "image": {
                            "properties": {
                                "id": {
                                    "fields": {
                                        "keyword": {
                                            "ignore_above": 256,
                                            "type": "keyword"
                                        }
                                    },
                                    "type": "text"
                                }
                            }
                        },
                        "instance": {
                            "properties": {
                                "id": {
                                    "fields": {
                                        "keyword": {
                                            "ignore_above": 256,
                                            "type": "keyword"
                                        }
                                    },
                                    "type": "text"
                                }
                            }
                        },
                        "machine": {
                            "properties": {
                                "type": {
                                    "fields": {
                                        "keyword": {
                                            "ignore_above": 256,
                                            "type": "keyword"
                                        }
                                    },
                                    "type": "text"
                                }
                            }
                        },

etc. etc.

If filebeat is adding fields you do not want then you may be able to disable the processors that are adding them.

If there are specific top-level fields you want to remove then a prune filter may help. If you blacklist a top-level object ([agent] for example) then all of the fields within it will also be removed from the event.

If there is a specific set of fields you want to keep then you also have the option of creating a mapping that defines them and turning off dynamic mapping.

1 Like

Hi @Badger,

Thanks for the reply. You're right there were processors defined in filebeat.yml that I did not need.

I'm also looking at the prune filter, thanks for the tip.

As I mentioned I have already decided what the mapping should look like, so I think removing the unwanted processors to reduce the overall size of the payload from filebeat combined with turning off dynamic mapping, and defining my own mappings as I have already tried to do should get me to where I want to be.

I've had a quick look around to try to find out how to disable dynamic mapping, but am still unsure how to achieve this exactly?

Thanks,
David

Read through this thread.

Hi @Badger,

Again, thankyou for the reply. I've hit a bit of a brick wall with this. So I'm unable to blacklist the "kubernetes" name as this breaks the indexing as im using index => "%{[kubernetes][namespace]}" in my output, and looking at the plugin dosc it only supports excluding top level fields.

Fine, but then I cant seem to exclude any additional fields within the "kubernetes" property other than namespace. Ive read the stack overflow article you linked to numerous times. Setting dynamic to "strict" means no documents are indexed for obvious reasons (they contain undeclared fields from the mapping), and setting it to false doesnt seem to work - new fields are still created dynamically within the mapping???

You should ask a question about this in the elasticsearch forum.

@Badger I've done some more testing and I was incorrect about the mapping changing. Having false as the value for dynamic does actually prevent the mapping changing, but the data seen in kibana appears to be the full _source of the data from logstash into elasticsearch. The values are persisted in elastic even though they arent indexed, or changes made to the mapping.

So am back to needing to find a way to reduce the payload from logstash to elasticsearch. I need to remove subkeys from the kubernetes top level field. As discussed previously this isnt possible with the prune filter, but I'm thinking it should be possible, maybe using ruby, like you recommended here:

Turned out this was me not understanding how elasticsearch actually works.

Link here for reference -> Dynamic mapping setting not honoured

The solution I settled on was to extract the required properties from the subkeys, set them as top level props in the event (using ruby filter), and then after drop the bulky data I didnt need from the event using its top level key in the prune filter, hence it never makes it to eleasticsearch