I have multiple copies for a single entry of a log file in Elasticsearch. For eg. In my original log file, for a timestamp 12:55:03:234 there is a log entry X. In Elasticsearch for that timestamp, I have duplicates of X (only difference is _id). Now this number of duplicates keeps varying for different log entries and there are entries for which no duplicates exist.
Please help me understand why this is happening?
How can I remove the duplicate log entries from elastic?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.