Replication in ES

Hey .. can anyone make one thing clear .. i was running ELK stack. Logstash was reading files from a logs from a folder on my local machine. I had to stop the logstash for some reason and then i had restarted it .. everything is going fine . But since I have restarted it , will it start reading files from the starting . If it will , it means there would me replications of same documents with different document id because they might have got read by logstash twice or it will not let replication happen . In fact ES does replicate each shard , I am not talking about those replications .

Any help would be appreciated .

Logstash was reading files from a logs from a folder on my local machine. I had to stop the logstash for some reason and then i had restarted it .. everything is going fine . But since I have restarted it , will it start reading files from the starting .

It won't if it's correctly configured. There are ways to screw this up but the default configuration is safe in this regard.

Hi Magnus,

Where does logstash store it's meta information, about what it processed.

https://www.elastic.co/guide/en/logstash/current/plugins-inputs-file.html#_tracking_of_current_position_in_watched_files

@magnusbaeck
I am putting my old data into ES , so I run logstash durning the office hours and then I stop and do the same in next morning . By opening this topic and by your answer that if you stick to the default configuration, it won't make any replicas of a document with different document id. I was doing the same.

But when i crossed check , and applied aggregation on userid and timestamp , idealy, doc_count should be zero because timestamp can't be same for the same user .But it is showing some value , it means replications did happen .
And through kibana , I verified it.
Here is the screenshot -


see first two documents are identical with different document id .
Can you tell me how and the solution to prevent it happening again ?
I am using default configuration of logstash .

It's impossible for me to tell why this happened. The evidence (the sincedb file) has been overwritten and complete logs aren't available.

@magnusbaeck
Hi...
if I put this in logstash's config file ..

input{
file {
path => "/home/mywavia/new/accesslog49.txt"
start_position => "beginning"
}

and i start it after stopping it for some reason , would logstash start parsing the input file from the beginning each time I restart ??,
Because , I had a input file with 1.6 million documents , after parsing through logstash , I found some 4 million documents in that particular index through kibana's monitoring section .
Do you think that "start_position" tag made logstash to start over on the same input file each time after I restarted it ?

and i start it after stopping it for some reason , would logstash start parsing the input file from the beginning each time I restart ??,

No, it'll still use your sincedb file. The start_position option only matters for previously unseen files, i.e. files with no sincedb entry.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.