Data replication from one ES to another using Logstash

Dolph_2709 · September 24, 2018, 2:11pm

Hello,

I need to replicate data from old ES (ver. 1.1) instance to latest ELK.
The older ES is part of IBM Mobile First solution, I can access it through REST API.

This is a Logstash's config script I use on destination server (which has latest version of ELK):

input {
elasticsearch {
hosts => ["1X.XX.1XX.1XX:9500"]
index => "worklight"
scroll => "10m"
size => 4000
query => '{ "query": { "range": { "timestamp": {"gt":"now-10m/m", "lte" :"now" } } } }'
docinfo => true
schedule => "* * * * *"
}
}
 filter{ 
   date {
        match => [ "timestamp", "UNIX_MS" ]
    }

    mutate 
    {
      add_field =>  { "log_type" => "%{[@metadata][_type]}" }
      
    }
   mutate
   {
       lowercase => ["log_type"]
   }
}

output {
if ( [log_type] == "customdata" or [log_type] == "mfpapplogs" ) {
       elasticsearch {
         hosts=>["1x.XX.XX.1XX:9200"]
         index => "mfp_%{[log_type]}-%{+YYYY.MM.dd}"
         document_type => "%{[@metadata][_type]}"
         document_id => "%{[@metadata][_id]}"
        }
    }
}

It worked OK, but after few days when indexes close to 3 mil records, it starts to "slowdown" Kibana on destination server and even stopped ES instance.

yaauie · September 24, 2018, 10:05pm

Your performance problems appear to be related to your Elasticsearch cluster, and you'll likely be better off asking about it in the Elasticsearch forum.

When you do, please include stats about your Elasticsearch cluster, including node-count, hardware information, stats on your number of indexes, number of shards, documents-per-index, and query patterns.

If Elasticsearch was stopped due to load, log messages from Elasticsearch in that timeframe could also be very helpful in figuring out why.

Dolph_2709 · September 25, 2018, 6:38pm

I think that the problem is in elasticsearch plugin (input)

input {
elasticsearch {
hosts => ["1X.XX.1XX.1XX:9500"]
index => "worklight"
scroll => "10m"
size => 4000
query => '{ "query": { "range": { "timestamp": {"gt":"now-10m/m", "lte" :"now" } } } }'
docinfo => true
schedule => "* * * * *"
}
}

I have another config that uses jdbc plugin which is pulling more than 10Gb of logs daily from SQL server to the same Elastic cluster and has no performance issues.

yaauie · September 25, 2018, 9:54pm

Your initial problem statement indicates performance degradation and a possible node crash on what you are calling your destination server:

I'm not sure how that is leading you to believe that the Logstash input for this pipeline is at fault. What am I missing?

system · October 23, 2018, 9:54pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Replicate all changes from one cluster to another Logstash	2	980	July 6, 2017
Elasticsearch data DR site Elasticsearch	4	1738	May 2, 2017
Replication in ES Logstash	8	1591	August 2, 2017
Unable to copy data from one ES instance to another using LS Logstash	1	560	April 20, 2017
Copy data from old index/indice over to new one Elasticsearch	5	1047	July 5, 2017

Data replication from one ES to another using Logstash

Related topics