Definitions of Logstash Collector, Processor and Shipper


I have seen reference to collectors, processors and shippers in various places, but I can't find a definition of either one. Have they always existed? Did they exist for a while as important artifacts and no longer are so important? Where are they mentioned in the documentation? (I could find nothing...)

FWIW, they appear to be configuration variations of a Logstash instance. If the output is a queue, then the instance is a collector. If the input is a queue, then it's a processor?


(Brandon Hatch) #2

Try reading through this link, particularly the part about message queueing.

If you are using a pipeline that has a message queue, each tier has it's own purpose. How logstash is configured changes depending on what you want it to do.

Collector = Pretty much replaced by beats instead of using logstash. This runs on the server that is generating the logs, and sends data to a shipper. There may be hundreds of these all sending data to the same shipper machine.

Shippers = Receives data from lots of different sources and machines. Ideally does the minimal amount of processing possible (No filters) and then sends it off to a message queue such as Kafka or Redis.

Processors (I believe this would be the same as indexer) = Pulls data from the message queue, applies whatever filters are necessary, and then inserts it into Elasticsearch. These tend to be more CPU hungry than the shippers.

This creates a lot of extra servers to manage, but it offers a lot of benefits too. If all you need to do is log a few thousand messages per day, none of this would really apply. If you want to log 500 million messages per day, you won't be able to get away with a simple setup.

(Magnus Bäck) #3

With the exception of "shipper" I'd say that none of these terms are very established.

(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.