Hey .. can anyone make one thing clear .. i was running ELK stack. Logstash was reading files from a logs from a folder on my local machine. I had to stop the logstash for some reason and then i had restarted it .. everything is going fine . But since I have restarted it , will it start reading files from the starting . If it will , it means there would me replications of same documents with different document id because they might have got read by logstash twice or it will not let replication happen . In fact ES does replicate each shard , I am not talking about those replications .
Logstash was reading files from a logs from a folder on my local machine. I had to stop the logstash for some reason and then i had restarted it .. everything is going fine . But since I have restarted it , will it start reading files from the starting .
It won't if it's correctly configured. There are ways to screw this up but the default configuration is safe in this regard.
@magnusbaeck
I am putting my old data into ES , so I run logstash durning the office hours and then I stop and do the same in next morning . By opening this topic and by your answer that if you stick to the default configuration, it won't make any replicas of a document with different document id. I was doing the same.
But when i crossed check , and applied aggregation on userid and timestamp , idealy, doc_count should be zero because timestamp can't be same for the same user .But it is showing some value , it means replications did happen .
And through kibana , I verified it.
Here is the screenshot -
see first two documents are identical with different document id .
Can you tell me how and the solution to prevent it happening again ?
I am using default configuration of logstash .
and i start it after stopping it for some reason , would logstash start parsing the input file from the beginning each time I restart ??,
Because , I had a input file with 1.6 million documents , after parsing through logstash , I found some 4 million documents in that particular index through kibana's monitoring section .
Do you think that "start_position" tag made logstash to start over on the same input file each time after I restarted it ?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.