I am trying to build an ELK cluster together with Kafka to get the best indexing rate in Elasticsearch. Following the description of my cluster hardware and configuration below :
- is the architecture correct for my purpose ?
- if no, how can I improve it ?
- how can I improve the indexing rate while reducing logstash batch size so as to reduce required memory by logstash ?
Here the story :
I have four nodes with each 32GB of memory, two 7200rpm 2To HDD and a Core i7-7820X at 3.6Ghz. I want to build an ELK cluster over 3 nodes and keep one node and its resources for source messaging. Kafka as a message bus over the 4 nodes consumed by the Logstash instances.
The data path for Kafka and Elasticseach contains two directories, one for each HDD so has to take advantage of stripping.
Detailed Architecture :
Node1 : message producer.
Node 1.2.3 : Zookeeper
Node 1, 2, 3, 4 : Kafka
Node 2, 3, 4: Logstash, Elasticsearch
Node 4: Kibana
The messages to index once in JSON is around 400 Bytes. Kafka topics has 10 partitions each.
Logstash pipeline input forms one Kafka consumer group per topic. Logstash kafka bootstrap server setting is configured to the localhost to consume the Kafka instance running on same host. It is to avoid useless network communication between Kafka nodes. However not sure if Kafka forces a consumer to read partitions on another host beside the given bootstrap server setting.
Elasticsearch index has a template to convert each field of the message to the correct format. It also has a pipeline to automatically generates a monthly topic based on the message timestamp and to discard two identic messages. Each index is configured to use 36 shards. Measured shard size does not go over 2GB for a year of messaging.
- kafka has JMX/JMS to 1Go.
- Logstash has JMX/JMS to 12Go.
- Elasticseach has JMX/JMS to 15Go.
Measured indexing rate: 15000 messages/s with a logstash batch size of 40000.
At some point Elasticsearch hangs with bulk exceptions, hot threads shows time spent on flush operation. This is because of a too large batch size setting. If I reduce the batch size to save some memory for OS filecaching, the indexing rate drops significantly but Elasticsearch remains stable over time.