Definitions of Logstash Collector, Processor and Shipper

I have seen reference to collectors, processors and shippers in various places, but I can't find a definition of either one. Have they always existed? Did they exist for a while as important artifacts and no longer are so important? Where are they mentioned in the documentation? (I could find nothing...)

FWIW, they appear to be configuration variations of a Logstash instance. If the output is a queue, then the instance is a collector. If the input is a queue, then it's a processor?

TIA,
Greg

Try reading through this link, particularly the part about message queueing.
https://www.elastic.co/guide/en/logstash/5.0/deploying-and-scaling.html#deploying-message-queueing

If you are using a pipeline that has a message queue, each tier has it's own purpose. How logstash is configured changes depending on what you want it to do.

Collector = Pretty much replaced by beats instead of using logstash. This runs on the server that is generating the logs, and sends data to a shipper. There may be hundreds of these all sending data to the same shipper machine.

Shippers = Receives data from lots of different sources and machines. Ideally does the minimal amount of processing possible (No filters) and then sends it off to a message queue such as Kafka or Redis.

Processors (I believe this would be the same as indexer) = Pulls data from the message queue, applies whatever filters are necessary, and then inserts it into Elasticsearch. These tend to be more CPU hungry than the shippers.

This creates a lot of extra servers to manage, but it offers a lot of benefits too. If all you need to do is log a few thousand messages per day, none of this would really apply. If you want to log 500 million messages per day, you won't be able to get away with a simple setup.

With the exception of "shipper" I'd say that none of these terms are very established.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.