How many shards for 5 millions of groups?

harisk · October 2, 2020, 7:08am

Now we have about 5 millions of groups and 100 millions of users. There are about 120 million messages every day users sent to the group, approximate 12GiB. We'd like to use Elasticsearch to search group messages with text.

We're testing with a time-based index (index by day, week, and month). We are also using 5 shards/index with routing key is groupID.

The text analyzer is edge-nGram with min-gram is 2 and max-gram is 20.

The peak indexing rate is 4000 req/s. However, the searching is too slow. We've expected that we can process 3000 req/s for searching. Because of the big number of groups, Elasticsearch spent a lot of time to query.

We've tried to compare 5 shards/index and 11 shards/index for both daily-index and monthly-index, but searching on 11 shards/index is slower than 5 shards/index. Anyway, both sharding strategies are unsatisfied with our case.

My questions are:

Which time-based index we should use in our case? By day, week, or month?
How many shards we should use for the selected index?

P/S

I've read a blog from Discord talking about how Discord indexes billions of messages for group search. I know Discord is using their shard allocator which do mapping from groupId to (cluster, index) of Elasticsearch. Their mapping is using a database with caching. They don't use sharding from Elasticsearch because of their application-level sharding.

I concern that we should follow as Discord architecture or believe Elasticsearch Sharding.

Many thanks!

Christian_Dahlqvist · October 2, 2020, 8:13am

If you are using routing you probably want to have a large number of primary shards as that improves efficiency. Given the you want to keep shards reasonably large this probably means you should use monthly indices.

harisk · October 2, 2020, 8:23am

Hi @Christian_Dahlqvist,

If we use monthly indices, there are about 3,6 billion documents per index. How many shards we should use for an index and How many data nodes we should scale shards?

I'd like to try with 100 shards per index, it is estimated to around 36 million documents/shard. Is it OK?

Tks!

Christian_Dahlqvist · October 2, 2020, 8:26am

Aim for a shard size around 10GB and use this to determine the number of primary shards to use as a starting point. How many nodes you will need will depend on the hardware as well as the retention period.

system · October 30, 2020, 8:26am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Scaling index/shard model: Message conversations Elasticsearch	2	845	December 24, 2019
Need advice on shards for my index Elasticsearch	15	938	September 30, 2020
Shards too large to archive data Elasticsearch	6	953	May 5, 2017
Correct number of shards for 5.3 TB indices Elasticsearch	10	2152	May 18, 2017
How many shards to set when ~2TB data need to be indexed? Elasticsearch	6	1496	July 6, 2017

How many shards for 5 millions of groups?

Related topics