Improve indexing throughput in Elasticsearch

elk_user1 · June 8, 2017, 3:49pm

Hi Team,

I am using kafka connector to load data from kafka topic into an elasticsearch index. Throughput into kafka topic is around 2000 requests per seconds. But throughput at elasticsearch is quite lower like around 500-800 requests per secs ( 5 primary shards + 1 replica). Any suggestion to improve the indexing rate?

Thanks in advance.

ugosan · June 8, 2017, 6:56pm

How big is your cluster, and are you sending all your data to just one node or several?

Also, have you tried maybe routing your kafka output throught Logstash?

Christian_Dahlqvist · June 8, 2017, 8:11pm

What its the specification of your cluster? How large are your events? What bulk size is being used? How many concurrent connections?

elk_user1 · June 9, 2017, 5:59am

Hi there..

Thanks a lot for replying. One event size is 16kb.

And we have elasticsearch cluster with 6 nodes.. with 3 masters (only one dedicated) ,4 data nodes and 1 client node. Bulk size from connector is currently 2000. I tried changing batch size to 3000, but there are no improvements. While setting replica as zero we get a throughput of around 1200 requests per sec, but we need to set replica as atleast one in production and aiming for indexing throughput of 2000 requests per second.

Thanks in advance

Christian_Dahlqvist · June 9, 2017, 6:18am

Do you have monitoring installed? What is the CPU usage and disk I/O looking like on the data nodes during indexing? How many CPU cores does each node have? What type of storage is being used?

How many concurrent connections/threads does the Kafka connector use to index into the cluster?

elk_user1 · June 9, 2017, 6:22am

We are monitoring via X-pack in Kibana. CPU usage is mostly below 50% most of the time on all nodes.. Each node configured with 14 GB RAM and 8 core processor. There are 5 concurrent tasks running to load using kafka connector.

Christian_Dahlqvist · June 9, 2017, 6:29am

Assuming that you are sending bulk requests to all data nodes, I would recommend increasing the number of parallel indexing threads. Increase slowly and monitor indexing throughput until you see no further gain in throughput. 5 connections/threads sound a bit low for a cluster that size in my opinion.

elk_user1 · June 9, 2017, 6:32am

And what about bulk size? Considering each event is of size 16 KB.. So how much do you suggest for bulk size? Currently it is 2000.

Christian_Dahlqvist · June 9, 2017, 7:02am

As your documents are quite large, that does sound a bit big. I would probably recommend going with a smaller bulk size rather than larger, but what you have may also be appropriate. You need to benchmark to know for sure.

elk_user1 · June 9, 2017, 9:50am

I was just going through blogs on improving indexing performance and came across indices.cluster.send_refresh_mapping property. I tried setting it to false in elasticsearch.yml but it shows unknown setting. Could you tell how to set this? We are using Elasticsearch 5.4.0.

Christian_Dahlqvist · June 9, 2017, 9:58am

I would recommend optimising throughput by benchmarking different bulk sizes and number of concurrent connections before starting to experiment with expert level settings as Elasticsearch generally comes with good defaults. One thing you may however want to change at the index level is the refresh interval.

system · July 7, 2017, 9:59am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Index throughput issues - tried all tuning suggestions posted Elasticsearch	1	407	July 6, 2017
Bulk Insert Throughput Issues Elasticsearch	2	337	July 6, 2017
Bulk Indexing Rate Elasticsearch	4	625	April 18, 2018
Indexing speed: Config changes don't seem to have any effect Elasticsearch	2	663	July 5, 2017
ElasticSearch Bulk indexing is not scaling Elasticsearch	7	2979	July 5, 2017

Improve indexing throughput in Elasticsearch

Related topics