Logstash in distributed environment

(Arun Prakash) #1

Could any one tell me how logstash can be used in distributed environment

I want to configure logstash in distributed environment and have to run on ETL batch jobs in distributed environment.

(Magnus Bäck) #2

Please be more specific. What, exactly, does "distributed environment" mean? What's the end goal?

(Arun Prakash) #3

Since I have to do ETL load for more than billion records from various databases. So If I run logstash in single node. It will take huge time. So, I planned to run the same conf file in many nodes. I am just asking you how to achieve it?. What are the configurations need in logstash.yml file?. I have to complete this task with lesser time and later the same script to be used for incremental load too.

(Magnus Bäck) #4

There's no mechanism for Logstash instances to talk to each other so you have to figure out a way for them to work independently. A few options come to mind:

  • Use different queries for each instance. Instance 1 only fetches rows whose id ends with 1 or 2, instance 2 only fetches rows whose is ends with 3 or 4, and so on.
  • Use a message broker like Kafka or RabbitMQ as a buffer between some process that pull from the database (may or may not be Logstash) and the Logstash instances that work on the broker's queue.

(system) #5

