Optimizing shard placement for indexing

Timur_Makarchuk · June 17, 2021, 7:49am

Good day, everyone!

I'm currently working on a tool that would spread indices in our cluster to spread indexing load more optimally, as inspired by this article (Optimal Shard Placement in a Petabyte Scale Elasticsearch Cluster - Meltwater Engineering Blog).

I have quite a few indices with very uneven write-load between them. So it seems like this would make sense for my index.

The problem I'm seeing is that two write-load markers I'm observing aren't necessarily following each other. Those two markers are increase of thread_pool.write.completed_tasks and increase of sum of indexing.index_time_in_millisover all the shards allocated on node in question.

My question is which one of those provides better optimization function for my use case.

My broader question is: Is it possible to understand what kind of tasks Write Threads are executing. I understand from docs that those are indexing and bulk tasks, but bulk I'd like to learn some details about that. Would bulk request produce task for Write Thread on a node request hits? Will actions from bulk request produce more tasks or the whole bulk request would produce just one task?

Thank you in advance.

system · July 15, 2021, 7:49am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Bulk indexing requests are mostly queued on one node in the cluster Elasticsearch	3	555	December 28, 2020
Is the nodes load a criteria for shard allocation? Elasticsearch	18	278	March 5, 2024
Spread out shards of write index to as many nodes as possible Elasticsearch	1	348	April 4, 2018
Bulk indexing performance Elasticsearch	10	4444	February 10, 2017
Improve indexing performance speed by routing to a specific shard Elasticsearch	8	210	March 27, 2023

Optimizing shard placement for indexing

Related topics