Maybe is there a possibility to check in logstash if some value exists in the database and avoid sending it again? Or maybe use the fingerprint plugin and generate an unique _id according to the value of the field? If I receive the same information in that field it could generate the same ID so it won't be saved again.
I also checked if there's any possibility to create unique fields in Elasticsearch but I see it's not possible.
This one is correct. I think you are seeing in the results all the hits which is the records returned from the query. If you scroll to the bottom of the return you should see the aggregation. Most of the time when doing aggregation you don't need the hits so you can remove them using the below and it will only return the aggs.
In this case I see that there are some values in the buckets list inside the aggregations but unfortunately not the data.
At the end I created another index with unique values according to the string using the fingerprint plugin. I'm not very sure if it's the best option but at the end I need to extract a lot of information and it was taking a lot of time.
I'll share what I did, maybe can be helpful to someone else that want to do the something similar.
Is there a way to receive unique values?
I've used the fingerprint plugin in this case. I've generated an unique ID based on the string. e.g, if I receive the same app_name name it will generate always the same _id so it won't be repeated in Elasticsearch. I've added this config in the logstash.conf pipeline, specifically in the filter side:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.