Retrieve ES index with LS input plugin NOT(!) everytime from the beginning?


i'm currently retrieving data from an "external" elastic index and write the events into csv files (for later postprocessing). This works fine so far.

My problem is:
When i shutdown LS and restart it at a later point, it continiues reading from the beginning of the existing index, so i will get duplicate events.

I'm using (on Ubuntu 16.04):
logstash 2.4.1
logstash-filter-csv (2.1.3)
logstash-input-file (2.2.5)
logstash-output-elasticsearch (2.7.1)

My config looks like the following:

input {
elasticsearch {
hosts => ["somehost:9200"]
index => "someindex"
type => "om_event"

filter {
if [type] == "om_event" {

output {
if [type] == "om_event" {
csv {
csv_options => {"col_sep" => ";"}
fields => ['some','fields']
path => "/path/to/file/sometest_%{+yyyy.MM.dd}.log"
stdout {
codec => rubydebug

I googled a bit and someone mentioned to use a timestamp in a query within the input-section like below:

query => '{"query":{"range":{"parsed_date":{"gte": "${LAST_RUN}"}}}}'

so i tried to create a bash-script which measures the timestamp of my last run and writes it to a logfile on each run and also sets it as an environment variable. But it seems, i've started diving in this parent/child-world of linux processes without fully understanding it :frowning:


read last run from file

source ./logstash_last_run.log

echo "previous last run: $LAST_RUN"

call logstash

/opt/logstash/bin/logstash -f /etc/logstash/conf.d/om_elastic.conf

store new date in logfile

export LAST_RUN=$(date +"%FT%T")
echo "export LAST_RUN=$LAST_RUN" > logstash_last_run.log
echo "new last run: $LAST_RUN"

Does anybody have an idea how to solve this problem?

(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.