Elastic search non-responsive under load after topology change


We are trying to move from a massive monolithic index to daily index. When we query the daily index then the disk IO increases on one of our nodes. There are a lot of reset network connections and the cluster seems to get blocked up. Any idea what could be going on here? Querying the massive single index works flawlessly. I understood that smaller indexes would be ideal but every time we try and move to the daily indexes our system freezes up.

Any advice about this would be appreciated.

What volumes are we talking about? How large is the single index? How many shards? How many daily indices and shards does this correspond to?

The single index has 991368720 documents and uses 744 GB on 5 active shards with 5 passive shards.
There are 447 daily indexes with anywhere from 300 000 records to 7 million records
Each index has 5 active shards with 5 passive shards.

I have 7 data nodes supporting this all running 5.6.13.
I have 3 master nodes.

So you have gone from 10 shards with an average size of ~ 75GB to 4470 shards with an average size of 160MB?

The first is possibly on the large size and the latter is likely far too many small shards. As outlined in this blog post this can be very inefficient. If I was resharding this into time-based indices I would probably recommend to use monthly indices with 1 or 2 primary shards and 1 replica. That should give you an average shard size in the tens of GB range.

Ok thanks, we want to keep the indexes smallish in order to recover from disasters as quickly as possible. One proposal is to keep 7 days of daily indexes that are then merged into weekly indexes. Do you think that is feasible?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.