Hi,
i'm currently retrieving data from an "external" elastic index and write the events into csv files (for later postprocessing). This works fine so far.
My problem is:
When i shutdown LS and restart it at a later point, it continiues reading from the beginning of the existing index, so i will get duplicate events.
I'm using (on Ubuntu 16.04):
logstash 2.4.1
logstash-filter-csv (2.1.3)
logstash-input-file (2.2.5)
logstash-output-elasticsearch (2.7.1)
My config looks like the following:
input {
elasticsearch {
hosts => ["somehost:9200"]
index => "someindex"
type => "om_event"
}
}
filter {
if [type] == "om_event" {
[...]
}
}
output {
if [type] == "om_event" {
csv {
csv_options => {"col_sep" => ";"}
fields => ['some','fields']
path => "/path/to/file/sometest_%{+yyyy.MM.dd}.log"
}
stdout {
codec => rubydebug
}
}
}
I googled a bit and someone mentioned to use a timestamp in a query within the input-section like below:
query => '{"query":{"range":{"parsed_date":{"gte": "${LAST_RUN}"}}}}'
so i tried to create a bash-script which measures the timestamp of my last run and writes it to a logfile on each run and also sets it as an environment variable. But it seems, i've started diving in this parent/child-world of linux processes without fully understanding it
#!/bin/bash
read last run from file
source ./logstash_last_run.log
echo "previous last run: $LAST_RUN"
call logstash
/opt/logstash/bin/logstash -f /etc/logstash/conf.d/om_elastic.conf
store new date in logfile
export LAST_RUN=$(date +"%FT%T")
echo "export LAST_RUN=$LAST_RUN" > logstash_last_run.log
echo "new last run: $LAST_RUN"
Does anybody have an idea how to solve this problem?