How many shards for 5 millions of groups?

Now we have about 5 millions of groups and 100 millions of users. There are about 120 million messages every day users sent to the group, approximate 12GiB. We'd like to use Elasticsearch to search group messages with text.

We're testing with a time-based index (index by day, week, and month). We are also using 5 shards/index with routing key is groupID.

The text analyzer is edge-nGram with min-gram is 2 and max-gram is 20.

The peak indexing rate is 4000 req/s. However, the searching is too slow. We've expected that we can process 3000 req/s for searching. Because of the big number of groups, Elasticsearch spent a lot of time to query.

We've tried to compare 5 shards/index and 11 shards/index for both daily-index and monthly-index, but searching on 11 shards/index is slower than 5 shards/index. Anyway, both sharding strategies are unsatisfied with our case.

My questions are:

  • Which time-based index we should use in our case? By day, week, or month?
  • How many shards we should use for the selected index?

P/S

I've read a blog from Discord talking about how Discord indexes billions of messages for group search. I know Discord is using their shard allocator which do mapping from groupId to (cluster, index) of Elasticsearch. Their mapping is using a database with caching. They don't use sharding from Elasticsearch because of their application-level sharding.

I concern that we should follow as Discord architecture or believe Elasticsearch Sharding.

Many thanks!

If you are using routing you probably want to have a large number of primary shards as that improves efficiency. Given the you want to keep shards reasonably large this probably means you should use monthly indices.

1 Like

Hi @Christian_Dahlqvist,

If we use monthly indices, there are about 3,6 billion documents per index. How many shards we should use for an index and How many data nodes we should scale shards?

I'd like to try with 100 shards per index, it is estimated to around 36 million documents/shard. Is it OK?

Tks!

Aim for a shard size around 10GB and use this to determine the number of primary shards to use as a starting point. How many nodes you will need will depend on the hardware as well as the retention period.

2 Likes