Shards/Routing Design for my use case

Brian_Hudson · September 27, 2013, 6:05pm

I currently have a system which consists of many Lucene indexes and allows
users to search over a user-defined subset of these indexes. It works
(surprisingly?) well, but I am migrating over to ElasticSearch for scale on
a cluster.

Some stats on the system:
~2500 Lucene Indexes
~1M (small) documents per index
~15 new indexes added each month

Let's assume that there is an index for each student in the current system.
Each of these students can be categorized into one of 3 majors: English,
History, Computer Science (not my real use case but this is easier to
discuss).

In migrating this system over to ElasticSearch I was considering keeping
the pattern of each student having their own shard (in the case of ES) but
after listening to Shay Banon's talk on the "kagillion" shards problem (tm)
I am thinking now that it is not the right approach.

It sounds like the better approach would be to create a single index
(students) and use routing to route all the documents for a given student
to the same shard, and then create aliases with filters.

My question is, would there be any advantage to creating 3 indexes
(english, history, computer_science) instead of just a single (students)
index?

If 50% of the students are English majors, 45% are History majors and 5%
are Computer Science majors would it then make more sense to create the 3
indexes instead of the single index because I could then allocate more
shards to english and history than I do computer_science?

I guess I'm not clear on under what circumstances it is better to create
multiple indexes over a single index.

Thanks,

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Many indexes vs many shards Elasticsearch	4	504	July 18, 2018
Multiple primary shards vs Replicas - why? Elasticsearch	3	4141	February 2, 2018
Shards vs Indexes: which when for chunkable data? Elasticsearch	2	640	August 29, 2019
Shard per index on non-clustered elasticsearch instance Elasticsearch	3	450	July 6, 2017
ES indexing strategy Elasticsearch	4	3087	July 5, 2017

Shards/Routing Design for my use case

Related topics