Shards/Routing Design for my use case


(Brian Hudson) #1

I currently have a system which consists of many Lucene indexes and allows
users to search over a user-defined subset of these indexes. It works
(surprisingly?) well, but I am migrating over to ElasticSearch for scale on
a cluster.

Some stats on the system:
~2500 Lucene Indexes
~1M (small) documents per index
~15 new indexes added each month

Let's assume that there is an index for each student in the current system.
Each of these students can be categorized into one of 3 majors: English,
History, Computer Science (not my real use case but this is easier to
discuss).

In migrating this system over to ElasticSearch I was considering keeping
the pattern of each student having their own shard (in the case of ES) but
after listening to Shay Banon's talk on the "kagillion" shards problem (tm)
I am thinking now that it is not the right approach.

It sounds like the better approach would be to create a single index
(students) and use routing to route all the documents for a given student
to the same shard, and then create aliases with filters.

My question is, would there be any advantage to creating 3 indexes
(english, history, computer_science) instead of just a single (students)
index?

If 50% of the students are English majors, 45% are History majors and 5%
are Computer Science majors would it then make more sense to create the 3
indexes instead of the single index because I could then allocate more
shards to english and history than I do computer_science?

I guess I'm not clear on under what circumstances it is better to create
multiple indexes over a single index.

Thanks,

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #2