Indexes Scale

(Deepikaa Subramaniam) #1


Is it good to have indexes in say thousands => which leads to thousands of shards even if I have a single shard per index or have smaller number of indices and have filtered aliases for them.

If filtered aliases is a better approach, would there be any performance impact if a lot of aliases are created. I am assuming the impact if any would be during search?

(Zachary Tong) #2

Generally speaking, you don't want thousands of mostly-empty indices. A single shard can usually handle millions of documents (e.g. my macbook air can comfortably handle 15-20m docs in a shard without breaking a sweat). Each shard has a certain amount of overhead, so if you aren't using a shard fully you'll be losing a lot in wasted overhead.

Fewer indices + filtered aliases is probably a better route to take. Aliases have essentially zero impact on search performance. Under the covers, you can think of aliases as a simple map from alias -> index. So looking up the underlying index is very quick.

The place where aliases can get you in trouble is on the master. Aliases live in the cluster state, and the cluster state is managed by the master. If you have huuuge numbers of aliases, you can start to see the master bottleneck on processing cluster updates, and sometimes OOM in the worst offending clusters. The situation has steadily improved over time, so the level of problem depends on your ES version number.

Another side effect of giant cluster state is that the state is published to all nodes. So if your cluster state is 1gb, the master needs to broadcast that 1gb to all the nodes. This too has improved over time: old versions republished each cluster state, newer versions send deltas and batch stuff, etc. But it's still something to be aware of. don't really want hundreds of thousands of aliases either =)

(Deepikaa Subramaniam) #3

Thanks for the reply Zach! This is really useful.

When you say newer versions for masters performance improvement - which version can we expect to have this?
We are using 1.5.2

(Zachary Tong) #4

To be perfectly honest, I don't really remember. It's been a continuous set of improvements, starting around 1.4 I believe. Each version gets a bit better as new issues are identified and fixed. And there are more improvements sitting in master right now that will be released when 2.0 comes out, which should help the situation more.

(Deepikaa Subramaniam) #5

Great. Is there a reason why the design is such that the master doesn't distribute its load amongst them?. We have like 3 masters. But only the active one is performing all work when master comes into play.

(Deepikaa Subramaniam) #6

And thank you really for actively responding. This helps us a lot.

(Zachary Tong) #7

Disclaimer: I'm not a distributed systems guru, this is just a summary. Future readers of this post please don't quote me out of context :wink:

The main reason is so to prevent inconsistent shared state. In distributed systems, one approach is to centralize control in one entity (a single elected leader) and ensure everyone agrees who that leader is. Then only one machine is allowed to mutate state and your shared state is consistent. Or you can build a system that accepts state mutations in parallel and resolves conflicts later (via vector clocks, crdt's, etc)

ES went down the first route. So one master is the "active leader", while the rest are just waiting around to act as backups to increase availability and provide quorum. There's a lot of theory and discussion and disagreement about which is better, but I'll leave that to the academics =)

In general, the master responsibilities are very light. Updating and merging cluster states isn't a problem for the vast majority of clusters, just those that start to get abusive in terms of enormous mappings or hundreds of thousands of aliases, etc. But a lot of these improvements have come from those clusters that are rather abusive, so it's certainly helped ES become more resilient over time =)

(Deepikaa Subramaniam) #8

Thank you!

Experimenting with this feature just to understand bringing down a data node and adding it with minimal overhead
PUT /_cluster/settings
"transient" : {
"cluster.routing.allocation.enable" : "none"

  1. Even though this process is adapted after I turn on the setting I do see a few relocating shards.
  2. I think this would be a great feature. I don't see the impact though. Am i missing something. Both approach i see similar cluster state

(system) #9