I have a Logstash 6.x pipeline which processes Apache access logs and stores each line as a document in Elasticsearch 6.x (using the index template from Filebeat). I am now upgrading both Logstash and Elasticsearch to 7.x. I took the Filebeat ingest pipeline for Elasticsearch and ported it over to Logstash as shown here.
For my 7.x pipeline, I have made the appropriate changes, but I am facing one last issue; when I try to send events to Elasticsearch, I get the following error:
[2019-05-02T20:29:28,594][WARN ][logstash.outputs.elasticsearch] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"filebeat-7.0.0-2019-05-02", :_type=>"_doc", :routing=>nil}, #<LogStash::Event:0x44454d13>], :response=>{"index"=>{"_index"=>"filebeat-7.0.0-2019-05-02", "_type"=>"_doc", "_id"=>"De3OeWoBjOl-V72K-bE9", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"object mapping for [user_agent.device] tried to parse field [device] as object, but found a concrete value"}}}}
The mapping for user agent fields appears to have changed for 7.x to support ECS, and the index template that I have added through Filebeat expects ECS formatted data. The problem is that the following:
... adds an object to the event that is not compatible with ECS. There are a handful of differences, in fact. So my question is: is the useragent filter plugin supposed to generate data in a format that is compatible with ECS, or should I perform the numerous field renames/removals myself? Essentially, is this a bug, or is this behavior as expected?
We have not addressed this yet. There will be a group of users that do not use ECS (or Elasticsearch) or a mixture of non ECS events for other outputs and ECS for Elasticsearch.
At the moment you will need to use Mutate/rename as your last filter before the outputs to convert your fields and any auto-added fields to ECS.
Absolutely, that makes sense. I do understand the issue with backwards compatibility, especially for people who use the plugin with other outputs than Elasticsearch. I have converted my fields to be compatible with ECS for the time being. Perhaps a boolean option named "ecs" with a default value of false could be a way to go for a grace period. Thanks for the reply!
That boolean flag is what we are discussing - but in the Elasticsearch output.
We need feedback on how users will modify their configs for ECS (ideally, least effort).
If we make a mistake or ECS changes/improves then you must wait on us to release a new ES output but if you DIY then you got control but you have to monitor ECS changes. pros and cons etc.
That's an interesting idea to restructure data to ECS within the elasticsearch output. That would keep the user_agent plugin (and other plugins) independent of Elasticsearch, which seems like a decent goal for plugins. My personal opinion is that I would rather have this handled by the elasticsearch plugin rather than bloating my pipeline with code that reformats data to ECS, which also makes it more difficult to maintain. Keeping the output up to date seems like something that could be tested, since the ECS fields are described in JSON/YAML.
Either way, I think the ease of use outweigh the negatives - especially if it's something that one can disable/configure for the use cases where the behavior is not desired. I think it's "in the spirit" of the Elastic Stack to make things as plug and play as possible; for example, having Filebeat send data to Elasticsearch also uses an ingest pipeline that is maintained by you guys (which also breaks if ECS changes are not reflected there).
Anyways, that is just my opinion - I hope it's useful.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.