Our 5.2 ES cluster seems to be slower than the older 2.3 ES cluster. We are seeing 2000-3100 EPS for primary shards as depicted by the X-Pack Monitoring index graph within Kibana. Our former 8 node 2.3 ES cluster was at times reaching 25-30k and up to 90k+ EPS.
It seems the cluster is underutilized and performance is limited by an unknown setting.
Data flow: [ Kafka > Logstash > ES Cluster ]. We are using the common time based Logstash index to store events by logstash-YYYY.MM.DD. Using Kafka-OffsetMonitor shows a bottleneck with the Logstash consumer via the lag metric, the result is indexed documents can be 1+ hours behind the current time during peak times.
An overview of our 5.2 setup:
The ES cluster is 6 nodes with spinning disk raid0 arrays averaging about ~700 MBs during write tests. Each node has 32-48 cores, 64GB ram, Linux based with Oracle-Java. Heap size min/max is set to 30g. Average document size is roughly 1k. ( index primary store size / index document count ). We use 18 shards and 1 replica with a refresh_interval of 30s. Our indexes (primary store size) can reach 275gb+ per day with 300+ million events.
Kafka is setup with 64 partitions per topic. Logstash is configured to use 64 consumer_threads for Kafka input and output is set at 25 workers with a flush_size of 7500. I've tried varying the workers and flush_size to increase performance but 2100-3100 EPS seems to be the limit.
Using iostat on the ES nodes shows an average of 2 - 10 MBs. Another interesting behavior is the X-Pack graphs showing triangle waves, for the index graph the variance between low and high is roughly 1000 EPS ( ex: 2100 EPS and 3100 EPS ). Maybe this relates to the 30s refresh_interval and segment checkpoints. Does less variance represent better tuning ?
I'll change indices.memory.index_buffer_size next. The default of 3G shared across 36 active shards is about 83 MB each. A logstash config of 25 workers * 7500 flush size * 1000 byte documents * 6 data nodes / 36 shards is 31 MB. Does the index_buffer_size value relate to the refresh_interval ? If the buffer limit is reached before then does it trigger a refresh ? How does this relate to indexing latency ?
That sounds like a huge difference. How have you determined that is is Elasticsearch that is the bottleneck? Did you upgrade Logstash at the same time as you upgraded to Elasticsearch 5.2?
After more thinking and reading I think Logstash might be the bottleneck. The tool I'm using to monitor Kafka consumers (Logstash in this instance) is reporting Logstash is falling behind. For some reason I just assumed ES output writing was the culprit. Logstash is still at version 2.3 so I guess I should upgrade it.
So for Logstash you are still using the same config and setup that was able to push considerably higher volumes to an Elasticsearch 2.3 cluster? The reason I asked is that the internals of Logstash has changed recently, as has the guidelines for tuning it, so if you had migrated without making any changes it was possible you might have affected performance.
If you want to test the indexing rate of the Elasticsearch cluster you can try indexing test data from a file into a separate index using e.g. Rally. This should give you an indication of what the limit of your cluster is and show you where to focus your attention.
Thanks, I'll try that today.
Rally doesn't seem to be ideal for benchmarking an existing cluster. From what I can tell it downloads a copy of ES and uses predefined custom configurations as opposed to testing against a live cluster.
Upgrading Logstash to 5.2 seemed to help some but I still see lag with the Kafka consumer queue. After tuning the write speed is almost double, about 3k-6k EPS, but it still looks like Logstash or Logstash and ElasticSearch are failing to keep up.
With Logstash when using a high worker count for output plugins the ES cluster had warnings about the queue_size. IIRC the docs suggest this value shouldn't be modified so this might mean the ES cluster is near a performance limit due to the flush_size. I didn't take notes for this so I'll have to retest.
Disk write performance is about 5-10MB on the data nodes.. I still think the write speed should be an order of magnitude higher.
I'll write back after more testing and reading.
that is wrong and we cover your exact use case in our docs: Tips and Tricks — Rally 2.3.0 documentation
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.