Hi, I'm new to Elastic and very interested in this pipeline:
Data Sources -->LogStash --> Kafka -->LogStash --> ElasticSearch, where the first LS specifies gzip data compression with Kafka output plugin and the second LS enriches data with filter plugins.
I assume a gzip codec plugin is required on the second LS in order to process the data, does that mean decompression happens on the second LS? or on the final ES? Also, where does the compression actually happen, on the first LS or Kafka?
No it is not. I have logstash reading from kafka, discarding 99% of the data and writing with gzip compression to another kafka instance. Another logstash instance reads that topic and it does not specify compression on the input.
Thanks Badger, looks like you also have logstash -> kafka -> logstash.
I want to do some data processing work on the second logstash, in this case, I assume a gzip codec is required. Do you have the same data processing work running on the second logstash?
Do you happen to know where the compression happens, on the first logstash or on kafka
You do not require a gzip codec on the second logstash instance. The kafka message header indicates whether the message is compressed, so the input plugin will know whether to decompress.
I believe the kafka producer (i.e. logstash) is expected to do the compression but I am not certain.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.