Hello !
I use ELK to parse log file and csv file but yesterday a little problem appeared. All the data in my csv file were saved twice so a 1000 lines document is now a 2000 lines document. In fact you just have to divide all the result by two but my dashboards are on the local network so it's annoying for the others users.
First of all, I tried to locate the problem so i changed the output of my logstash config file with stdout{} and they were no problem with the output. Therefore I think the problem is beetween elastic and logstash.
I cheked elasticsearch but didn't find anything. I use the same logstash config file to parse log and csv so i don't understand while only csv files are impacted.
Here my logstash config file:
input {
beats {
port => "5044"
}
}
filter {
if "Log" in [tags] {
...
}
if "Csv" in [tags] {
...
}
}
output {
elasticsearch {
hosts => ["127.0.0.1:9200"]
index => "squid-%{File_Type}"
}
}
To make sure it does not happen, you should use one the columns of your CSV file as the _id of the document. That way, if for whatever reason the file gets parsed again, you will just overwrite the existing values.
I'm sorry for this delayed answer but i took a vacation
Thanks for your answer, i read a similar answer 1 week ago but i don't know how to do it. Can you exlain me please ?
In fact, Should I use the line number as the _id or add an id column in all my csv files and define it as the _id ?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.