Ingestion rate in a elastic cluster

I have a small cluster with only two nodes. Both nodes have all roles, and I am using bulk java API to index data. First of all, I optimized a single node cluster and gained 25M/s, then, I hoped that we can increase ingestion rate with two node clusters. However, ingestion rate in the cluster was decreased to 5M/s. I search and test many things to find the problem, but I couldn't. Do you have any ideas? and Is it a realistic expectation to increase ingestion rate when we add more nodes? Note that I do not use logstash.

Without knowing the details of your setup, it's hard to give any hard answers here, but let me give some suggestions.

Assuming you are running with the defaults, indexes in Elasticsearch are configured to be replicated once (to provide high availability). This means that all documents will be stored twice (once in a primary shard, and once in a replica shard). When you ran with one node, there was nowhere to put the replicas (replicas and primaries will never live on the same node - that would defeat the purpose). When you added the second node however, it got those replicas assigned to it. As a result, instead of having to write each document once, we now have to write it twice. So yes, you doubled your disk IO capacity by adding a 2nd server, but you also doubled the number of write operations. As a result, the theoretical best you could have achieved was to keep the same throughput as you had before.

So, one thing you could try, is disable replicas on your indexes. You can do so dynamically using the _settings endpoint for those indexes (https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-update-settings.html):

PUT my_index/_settings
{
  "number_of_replicas": 0
}

Keep in mind though: by disabling replicas you lose high availability (which you did not have to start with, when you ran with a single node), and you trade read throughput for write throughput.

Another thing to try is to play around with the bulk size. As your cluster configuration changes, the optimal bulk size may also change, so it's worthwhile running some benchmark tests to determine your optimum bulk size.

Then there could also be some factors outside of Elasticsearch that could influence write throughput:

  • is the network connection between the two nodes fast enough?
  • is the disk on the second server fast enough? Elasticsearch prefers local storage, ideally SSDs.
  • does the second node have its own dedicated resources (especially disk)? Just adding a second node to the same server that's also hosting the first node is not magically going to give you twice the performance.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.