Splitting small amount of data over multiple vs a single index?

We are in the process of jumping from ES 2.3.2 to 6.0. As part of this work, we are breaking up our monolithic index into multiple indexes. The index we have is split into 23 shards, each of which is about 50 GB. We are doing two things:

  1. Split out our types (we have 26 types with widely varying fields) into individual indexes
  2. Create date-based indexes

We are doing #1 as that as mapping types are being deprecated.

We are doing #2 as 80% of our queries are on data from just the last 30 days. We do have queries that go back over all time. Also, our data is mutable (any document from any date can be updated), so we are managing our index targeting in our api.

What I am dealing with now is that when we split this all out, for some of our larger types, this works really well. For the smaller ones, we end up with indexes that, once split out by type and date, are very small (like ~100 mb). I am concerned that for the small types, we will end up with 15 (we have a 15 month retention) indexes that are all small, making it inefficient for searching. Is this really bad? I did a test of collapsing the smaller ones and not having them date based, but I found performance actually went down. I am hyothesizing that this is because all the data was in one shard (I set it to one shard), and the search could not be parallelized.

My root question is to find out if there is a penalty for having a small amount of data spread over multiple indexes vs collocating it in one index? We really need the date-based for our larger types, and managing a mix of date vs non-date-based is undesirable.


This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.