We have a very large index of almost 600 million documents. A field called __assetType
allows us to distinguish between document types. Currently, there's a 10-90% distribution of document types, and we primarily search the 10% subset.
To improve latency, we previously sharded the index without a _routing field and increased CPU resources. Now, we're considering sharding by __assetType to significantly boost search speed for the 10% subset of documents. However, this would severely slow down searches for the 90% due to extreme skew.
We're willing to accept a slight latency increase for searches on the 90% subset, but we want to avoid making it excessively slow. Is it possible to create 10 shards, with 5 containing the 90% __assetType
(using a custom _routing field) and the other 5 containing the 10% __assetType
?
Is it anyway possible?
We tried to increase the routing factor of Elasticsearch for this formula
routing_factor = num_routing_shards / num_primary_shards
shard_num = (hash(_routing) % num_routing_shards) / routing_factor
But later realized it wouldn't affect anything as every time it will calculate the same shard_num as the routing factor is constant