Multiple child inastances of a single client or multiple clients, which is better for bulk indexing in large rates?

shameel · June 18, 2023, 3:55am

Hi Im using Elasticsearch v7.5.0 and I have a huge number of documents being ingested per second, as per the documentation it is recommended to use multiple clients for bulk indexing to reduce load. Can I get the same results if I'm using multiple child instances of a single client?

child clients

Christian_Dahlqvist · June 18, 2023, 4:30am

How are you currently indexing data into Elasticsearch? As you have linked to the JavaScript client I assume you are using a custom JavaScript application. Is this correct?

What type ofv data are you indexing? What is the average size ifva document? How much data do you have to index?

What is the size and specification of your cluster?

shameel · June 18, 2023, 5:17am

yes im using nodejs client. im working on a personal project with maximmum 5000 json documents ( mostly firewall logs ) being ingested per second to an 8 node cluster which is running in docker overlay network ( actually 4 virtual machines connected via docker overlay ) with 3 masters 4 data nodes and one coordinate nodes. I'm currently using 8 es clients to bulk ingest data. this 5k mentioned is altogether from these 8 clients per second, a single bulk request may contain data destined to multiple indices having 4 shards and one replica.
Due to high CPU and Network load on my cluster its very unstable (data nodes being disconnected frequently). im just trying to find the max limit where my cluster can hold?

Christian_Dahlqvist · June 18, 2023, 5:53am

It sounds like you may have hit the limitsof what your Elasticsearch cluster can handle. If the cluster is struggling I see no benefit in increasing load or concurrency from the client side.

How many indices and shards are you concurrently indexing into? Are you using time-based indices?

What is the specification of the VMs and how much resources are assigned to the different nodes?

Which nodes are you sending bulk requests to?

What type of storage are you using for Elasticsearch?

shameel · June 18, 2023, 4:07pm

At a time my javascript clients are ingesting data to almost 15 indices. Im not using any in build ilm or time based indices.

My VM configurations and the nodes running are

PC1 ( 6 Core/20 GB RAM/1TB HDD)

master-a (HEAP 2GB)
data-1 (HEAP 16 GB)

PC2 ( 6 Core/20 GB RAM/1TB HDD)

master-b (HEAP 2GB)
data-2 (HEAP 16 GB)

PC3 ( 4 Core/15 GB RAM/1TB HDD)

master-c (HEAP 2GB)
data-3 (HEAP 10 GB)

PC4 ( 4 Core/15 GB RAM/1TB HDD)

coordinate (HEAP 4GB)
data-4 (HEAP 10 GB)

I'm ingesting data to only data-1 and data-2 .
data-3 and data-4 are not in use as of now (these I'm planning to keep a different set of data in future - mostly non logs) I restricted index allocation only to 1st two data nodes using "index.routing.allocation.require.box_type".

Christian_Dahlqvist · June 18, 2023, 4:50pm

If bulk requests can target all 15 indices with up to 60 primary shards, you are going to end up with a lot of small writes, which can be inefficient, especially if you are using slow HDD storage.

Given that you are using HDDs, which are not ideal for high indexing loads, I would recommend you check iowait and disk utilisation to see wthether this is a bottleneck, e.g. using iostat -x.

I would recommend upgrading to SSDs and ensure each bulk requests target a minimum of different indiices.

When running Elasticsearch the recommendation is to not set the heap above 50% of the RAM available to the node. Your settings seem a lot higher than this, which is not good. If the master node on PC1 is allocated 4GB of RAM, it should have the heap set to 2GB while the data node. should have a heap of 8GB as it has 16GB RAM allocated to it.

Th same applied to the other nodes.

warkolm · June 20, 2023, 4:23am

Welcome to our community!

Please note that version is EOL and no longer supported, you should be looking to upgrade as a matter of urgency.

system · July 18, 2023, 4:24am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Multiple elasticsearch python client sending bulk json data to single ES cluster Elasticsearch	1	634	October 9, 2018
Alternative bulk indexing implementations? Elasticsearch	10	2334	July 5, 2017
Bulk Indexing Rate Elasticsearch	4	613	April 18, 2018
ElasticSearch Bulk indexing is not scaling Elasticsearch	7	2960	July 5, 2017
Elasticsearch index throughtput Elasticsearch	15	1694	April 17, 2019

Multiple child inastances of a single client or multiple clients, which is better for bulk indexing in large rates?

Related topics