Logstash in distributed environment

Hi All,

Could any one tell me how logstash can be used in distributed environment

I want to configure logstash in distributed environment and have to run on ETL batch jobs in distributed environment.

Thanks in Advance!

Please be more specific. What, exactly, does "distributed environment" mean? What's the end goal?

Hi Magnusbaeuk,

Since I have to do ETL load for more than billion records from various databases. So If I run logstash in single node. It will take huge time. So, I planned to run the same conf file in many nodes. I am just asking you how to achieve it?. What are the configurations need in logstash.yml file?. I have to complete this task with lesser time and later the same script to be used for incremental load too.

There's no mechanism for Logstash instances to talk to each other so you have to figure out a way for them to work independently. A few options come to mind:

  • Use different queries for each instance. Instance 1 only fetches rows whose id ends with 1 or 2, instance 2 only fetches rows whose is ends with 3 or 4, and so on.
  • Use a message broker like Kafka or RabbitMQ as a buffer between some process that pull from the database (may or may not be Logstash) and the Logstash instances that work on the broker's queue.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.