Data replication from one ES to another using Logstash

Hello,

I need to replicate data from old ES (ver. 1.1) instance to latest ELK.
The older ES is part of IBM Mobile First solution, I can access it through REST API.

This is a Logstash's config script I use on destination server (which has latest version of ELK):

input {
elasticsearch {
hosts => ["1X.XX.1XX.1XX:9500"]
index => "worklight"
scroll => "10m"
size => 4000
query => '{ "query": { "range": { "timestamp": {"gt":"now-10m/m", "lte" :"now" } } } }'
docinfo => true
schedule => "* * * * *"
}
}

 filter{ 
   date {
        match => [ "timestamp", "UNIX_MS" ]
    }

    mutate 
    {
      add_field =>  { "log_type" => "%{[@metadata][_type]}" }
      
    }
   mutate
   {
       lowercase => ["log_type"]
   }

}

output {
if ( [log_type] == "customdata" or [log_type] == "mfpapplogs" ) {

       elasticsearch {
         hosts=>["1x.XX.XX.1XX:9200"]
         index => "mfp_%{[log_type]}-%{+YYYY.MM.dd}"
         document_type => "%{[@metadata][_type]}"
         document_id => "%{[@metadata][_id]}"
        }
    }

}

It worked OK, but after few days when indexes close to 3 mil records, it starts to "slowdown" Kibana on destination server and even stopped ES instance.

Your performance problems appear to be related to your Elasticsearch cluster, and you'll likely be better off asking about it in the Elasticsearch forum.

When you do, please include stats about your Elasticsearch cluster, including node-count, hardware information, stats on your number of indexes, number of shards, documents-per-index, and query patterns.

If Elasticsearch was stopped due to load, log messages from Elasticsearch in that timeframe could also be very helpful in figuring out why.

I think that the problem is in elasticsearch plugin (input)

input {
elasticsearch {
hosts => ["1X.XX.1XX.1XX:9500"]
index => "worklight"
scroll => "10m"
size => 4000
query => '{ "query": { "range": { "timestamp": {"gt":"now-10m/m", "lte" :"now" } } } }'
docinfo => true
schedule => "* * * * *"
}
}

I have another config that uses jdbc plugin which is pulling more than 10Gb of logs daily from SQL server to the same Elastic cluster and has no performance issues.

Your initial problem statement indicates performance degradation and a possible node crash on what you are calling your destination server:

I'm not sure how that is leading you to believe that the Logstash input for this pipeline is at fault. What am I missing?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.