We are developing a customized beats for the monitoring tool and we are capturing metrics like cpu, memory, disk etc.. Data was sending it through output logstash which is configured in beat.yml file as like below,
We are receiving fields from the monitoring tool and one of the field as test_serial_id and I want to set the test_serial_id as "_id" in the elasticsearch document in order to remove the duplicates.
How to achieve this in beats? Please share your thoughts and it would be very helpful.
When publishing to elasticsearch, the id value will be used for _id. When publishing to logstash/redis/kafka, beats add a @metadata field to the event. This field will contain the id. You can configure the elasticsearch output in Logstash to use [@metadata][id].
When indexing, elasticsearch accepts an operation type. If you set it to 'create', then duplicate entries will be detected, but not overwritten. If you don't set it to create, then the old entry will be marked as deleted and the new event will be indexed.
Can I send the document id through output.logstash?
Yes. Please read this paragraph again:
When publishing to logstash/redis/kafka, beats add a @metadata field to the event. This field will contain the id. You can configure the elasticsearch output in Logstash to use [@metadata][id].
This will print all events, include the @metadata section, which should contain the id field. If it's missing, then you need to update libbeat, or you didn't add it to your event in the beat.
But when I sent the same data to beats port 5044 in logstash the _id value is not picking the same as above. Its creating a generated ID in the document.
Beats just forward some info via @metadata. Beats don't force you to make use of these information. You have to configure the Elasticsearch output in Logstash to use the id. Check out the Logstash elasticsearch outputs docs. Settings you might be interested in: action, document_id. Configure action => "create" and document_id => "[@metadata][id]".
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.