Retrieve ES index with LS input plugin NOT(!) everytime from the beginning?


#1

Hi,
i'm currently retrieving data from an "external" elastic index and write the events into csv files (for later postprocessing). This works fine so far.

My problem is:
When i shutdown LS and restart it at a later point, it continiues reading from the beginning of the existing index, so i will get duplicate events.

I'm using (on Ubuntu 16.04):
logstash 2.4.1
logstash-filter-csv (2.1.3)
logstash-input-file (2.2.5)
logstash-output-elasticsearch (2.7.1)

My config looks like the following:

input {
elasticsearch {
hosts => ["somehost:9200"]
index => "someindex"
type => "om_event"
}
}

filter {
if [type] == "om_event" {
[...]
}
}

output {
if [type] == "om_event" {
csv {
csv_options => {"col_sep" => ";"}
fields => ['some','fields']
path => "/path/to/file/sometest_%{+yyyy.MM.dd}.log"
}
stdout {
codec => rubydebug
}
}
}

I googled a bit and someone mentioned to use a timestamp in a query within the input-section like below:

query => '{"query":{"range":{"parsed_date":{"gte": "${LAST_RUN}"}}}}'

so i tried to create a bash-script which measures the timestamp of my last run and writes it to a logfile on each run and also sets it as an environment variable. But it seems, i've started diving in this parent/child-world of linux processes without fully understanding it :frowning:

#!/bin/bash

read last run from file

source ./logstash_last_run.log

echo "previous last run: $LAST_RUN"

call logstash

/opt/logstash/bin/logstash -f /etc/logstash/conf.d/om_elastic.conf

store new date in logfile

export LAST_RUN=$(date +"%FT%T")
echo "export LAST_RUN=$LAST_RUN" > logstash_last_run.log
echo "new last run: $LAST_RUN"

Does anybody have an idea how to solve this problem?


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.