I use filebeat to ship messages to Logstash, which parses the messages and ingest them into the Elasticseach cluster.
Each message/logline corresponds logically to a tab-separated row in a csv file. The messages are processed in Logstash after they have been consumed by the Logstash beat input plugin. I have created an index template, an ILM policy, a pipeline and a custom field mappings for the "columns" in the messages. Logstash processes the messages in a csv filter section, chooses the custom pipeline and sets the target index. This works all as expected.
However, since filebeat is used the original message/row is also stored within each document in Elasticsearch. This is just overhead since the content of each messages is split and stored fields, too.
It is the plan to use this design (filebeat, loglines, Logstash and custom indices) several time in the application.
My question is simple. What would best practise for removing the message field? Should I not use the csv filter at all and just split the messages, or should I use the csv filter and remove the messages as the last action in the filter section?