Logstash Reindexing task behaves irratic

We are using a GelfAppender and Logstash to send log entries to ElasticSearch. Log entries are sent to indices based on current date (one index per day of logging so we can easly clean up). We also index MDC (Mapped Diagnostic Context) fields. Initially we did not use a type mapping and Elasticsearch auto-detects the type of these fields.

Kibana started complaining about failing shards, and it turns out that the field "dbconnection" was of type "long" in one index, and "text" in the other. This was caused by our software sometimes just putting "2" in the field. In order to fix this we created a separate logstash installation and created a pipeline based on information in this article: https://sematext.com/blog/recipe-reindexing-elasticsearch-documents-with-logstash/

The pipeline configuration we use:

input {
  elasticsearch {
    hosts => ["elasticsearch:9200"]
    index => "logs-2018.03.08,logs-2018.03.09,logs-2018.03.10,logs-2018.03.11,logs-2018.03.12,logs-2018.03.13"
    query => '{ "query": { "query_string": { "query": "*" } } }'
    docinfo => true
  }
}

filter {
  mutate {
    convert => {
        "dbconnection" => "string"
    }
  }
}

output {
  elasticsearch {
    hosts => ["elasticsearch:9200"]
    index => "%{[@metadata][_index]}.v2"
  }
}

This pipeline was tested on a local machine and on a test environment and works. There is only one strange thing we see in the behavior:

Logstash starts reading logs-2018.03.08 and writing to logs-2018.03.08.v2 (which is okay). But after a seemingly arbitrary number of documents, it pauses and starts working on logs-2018.03.09 -> logs-2018.03.09.v2. It pauses that too, and picks up on the remaining work in logs-2018.03.08 again, then starts on logs-2018.03.12, etc etc.

It is currently still reindexing (we have millions of log documents). It looks like it is going well but the order in which indices are processed are a bit "look squirrel!", and I wonder what might be causing that.

We have one (very hectic) pipeline worker, standard out-of-the box logstash configuration, running logstash 6.1.3 on an AWS instance, and the elasticsearch instance is hosted at elastic.co.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.