How does shards-per-node contribute to indexing latency/throughput?

Hello,

In our environment, we have 12 nodes, 597 indices, total shards 1201, 0 unassigned shard.
How and what are the correct steps to get shards-per-node contribute to indexing latency/throughput.

I am not sure what exactly you are looking for. Could you please elaborate?

What is the problem you have and are trying to solve?

How many of your indices and shards are you actively indexing/updating/deleting and how many are read-only? How are you indexing into these?

Which version of Elasticsearch are you using? What is your use case?

Hi Christian,

I wanted to achieve the indexing latency. But the steps to achieve the Same I am not aware.
Can you please guide me the steps to achieve the same and let me know how it can be helpful to me.

If you tell us about your use case and answer the questions I asked we may be able to provide some guidance. Without that I would recommend going through the official guide on tuning for indexing speed.

Hi Christian,

Use case: For our environment, I wanted to check how can I achieve shared-per- node.
Also, current version which we are using is 8.4.
Currently we are having total of 599 indices
Among which 553 are open and healthy.
Also for shards we have used configurations roll over for 30 days or 50gb

I still do not understand what you mean by this and what you are looking for. Please provide more details.

If you are looking for guidance on how to best configure sharding and distribute shards across the cluster it would help if you could answer the following questions:

  • How many of your indices/shards are you actively indexing into?
  • What is your average shard size?
  • Are all data nodes in your cluster equal or do you have specific nodes that perform indexing, e.g. a hot-warm architecture?
  • What is the specification of the nodes that perform indeexing (CPU cores, RAM and type of storage used).
  • Have you tried to identify what might be limiting indexing throughput/latency, e.g. CPU saturation, GC or disk I/O?

This does still not answer the questions I aksed:

You have still not described the use case, but as you are using rollover it sounds like it might be a logging use case. Is that correct?

Yes, that is correct.

I am very new to elastic. Can you please let me know how can I check??
How many of your indices and shards are you actively indexing/updating/deleting and how many are read-only? How are you indexing into these?

The configuration which we normally use under ion is:
Hot phase :
Rollover 30 days or 50 go
Priority is 100
Warm phase :
Force merge data and ready only is enabled
Deleted phase : 45 days

If you are using data streams and/or rollover, only the latest index in each data stream/index patterns will be written to. In order to get the number of indices/shards written to you can add up the number of indices and shards for the latest index of each data stream/index pattern.

In order to calculate the average shard size you can use the data available in the cluster stats API.

We have different applications and there we are using logstash, ingest pipeline beats to send the data to Elasticsearch . It depends on the application

Yes, but how many indices and shards are actively written to in the whole cluster across all applications?

You have not answered many of my questions. If you are not able or willing to provide this I do not think I can help much.

Hi Neha, there is multiple questions and they are not related to each other, But let me mentioned some points as per my understanding according to your explaination -

  1. For data indexing latency, you always get time took field in response. You can use that metric to evaluate latency.
  2. There is no such API which will give you which index is active in terms of indexing/updating/deleting, Better to figure out that on your application level. From which application you're hitting such queries on which indexes.
  3. You can use _stats API to check latency (search, indexing). Also better to setup monitoring to get full visibility.