Is logstash distributed real time computing system?

sen · July 11, 2016, 10:51am

Hi folks,

The elastic website says that logstash can:

Centralize data processing of all types
Normalize varying schema and formats
Quickly extend to custom log formats
Easily add plugins for custom data sources

However, it is not mentioned if logstash can be distributed.
Can you tell me more about it ?

Thank you in advance.

S

magnusbaeck · July 11, 2016, 10:57am

It depends on what exact meaning of "distributed" you mean. Logstash doesn't have clustering support in the same manner as e.g. Elasticsearch, but that doesn't mean you can't run multiple Logstash instances to process a single stream of input events.

sen · July 11, 2016, 11:07am

Thank you for your reply.

The meaning of "distributed" here is the one of a distributed real time computing system (for instance Apache Storm). So, If I understand, logstash is not a distributed system. Therefore, it can mean that i might get bad perfomances with logstash compared to another distributed system as apache storm or spark streaming ?

Futhermore, how does it setup multiple Logstash instances ?

magnusbaeck · July 11, 2016, 11:40am

The meaning of "distributed" here is the one of a distributed real time computing system (for instance Apache Storm). So, If I understand, logstash is not a distributed system.

You're explaining the word "distributed" only by giving Apache Storm as an example, so you're not really explaining it at all. I think this discussion would be more fruitful if you expressed what you're looking for and what you want to accomplish.

Therefore, it can mean that i might get bad perfomances with logstash compared to another distributed system as apache storm or spark streaming ?

Maybe, maybe not.

Futhermore, how does it setup multiple Logstash instances ?

I'm not sure what you're asking. Do you want to know how to start multiple independent instances of Logstash or how to get a number of such instances to process data from a single source? Again, give us more background.

sen · July 11, 2016, 12:52pm

Sorry Magnus for not being so clear !!
In my case distributed means to divide and parrallelize some processes over several computers.
I would like to setup a real-time analytic system and I want to deploy it in a cluster. Indeed, it needs to collect, process and visualize data contained in a kafka server. In order to collect and process my data, I first used Logstash and then I stored the processed data in elasticsearch.
I read on internet about Apache Storm and Spark Streaming that are very popular as distributed real-time computing system. Since logstash is not distributed, my big concern about Logstash is that it might not be sufficient to treat fast enough input data, not scalable and not suitable for a distributed environment.

Regarding the last question I want to know how to get a number of such instances to process data from a single source

magnusbaeck · July 11, 2016, 1:28pm

Since logstash is not distributed, my big concern about Logstash is that it might not be sufficient to treat fast enough input data, not scalable and not suitable for a distributed environment.

Again, you need to understand that "Logstash is not distributed" isn't a useful comment. It's ambiguous and might not even be relevant to the discussion.

Regarding the last question I want to know how to get a number of such instances to process data from a single source

That depends on the source, but assuming you're running some kind of broker (like Kafka) you can run multiple Logstash instances and point them at different queues/topics/partitions (terminology depending on the broker) and achieve parallel processing of data.

sen · July 11, 2016, 2:50pm

Ok thank you for your reply about the instances.

Sorry again to talk about that but distribution worries me (maybe there is something I don't understand). Indeed, to be distributed is a good thing for scalability (the amount of input data gets increased)...

What I am trying to understand is why some big companies like twitter, Predikto using elasticsearch use storm or spark streaming instead of logstash.

Topic		Replies	Views
Scale Logstash in Distributed Real-Time Way based on Storm Logstash	8	4922	July 6, 2017
Could logstash distribute data to different nodes? Logstash	8	834	January 5, 2018
Logstash in distributed environment Logstash	4	979	September 17, 2017
Logstash vs spark streaming and storm Logstash	3	9345	July 6, 2017
Logstash tcp input load balancing Logstash	2	804	July 3, 2020

Is logstash distributed real time computing system?

Related topics