I have seen reference to collectors, processors and shippers in various places, but I can't find a definition of either one. Have they always existed? Did they exist for a while as important artifacts and no longer are so important? Where are they mentioned in the documentation? (I could find nothing...)
FWIW, they appear to be configuration variations of a Logstash instance. If the output is a queue, then the instance is a collector. If the input is a queue, then it's a processor?
If you are using a pipeline that has a message queue, each tier has it's own purpose. How logstash is configured changes depending on what you want it to do.
Collector = Pretty much replaced by beats instead of using logstash. This runs on the server that is generating the logs, and sends data to a shipper. There may be hundreds of these all sending data to the same shipper machine.
Shippers = Receives data from lots of different sources and machines. Ideally does the minimal amount of processing possible (No filters) and then sends it off to a message queue such as Kafka or Redis.
Processors (I believe this would be the same as indexer) = Pulls data from the message queue, applies whatever filters are necessary, and then inserts it into Elasticsearch. These tend to be more CPU hungry than the shippers.
This creates a lot of extra servers to manage, but it offers a lot of benefits too. If all you need to do is log a few thousand messages per day, none of this would really apply. If you want to log 500 million messages per day, you won't be able to get away with a simple setup.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.