Ideas for data distribution in a cross cluster search design


(David Ponessa) #1


I am working on building a large cross cluster search setup, consisting of several underlying clusters all equally sized, and a single smaller dedicated cluster to be used to access all nodes below with cross cluster search.

We ran a PoC and load testing, and it works quite well, but we have doubts in regards to how to distribute the data.

One idea is to do deterministic round robin to load data across all clusters equally. Every cluster will end up with the same set of indexes, about same amount of data, queries would need to be executed on all clusters to have a complete view, kibana views would need to be based on *:indexname-* to be complete, etc. Main advantages of this setup are easier management, truly equal load distribution between clusters, better performance on queries (if I understand the internals correctly).

The other idea is to distribute complete sets of data across different clusters, so we don't run into a scenario where a single cluster having issues affects completeness for every dataset.

We do have an active/active DR setup planned, (data replicated either via a duplication of the ingeastion pipeline or from cross cluster replication), so I was leaning towards the first option as we could setup automatic failover of the seed pool to equivalent DR cluster if it comes down to that, while we solve issues and resume indexing on the current active cluster.

So, asking the community for educated opinions of pros and cons here.

(David Turner) #2

It is not clear from your post why you want to do this with multiple clusters at all. Can you explain why you cannot just use a single cluster, perhaps with shard allocation filtering rules to control how your data is allocated to the individual nodes if you need this level of control.

Your question about how to partition data is exactly the concern that sharding (within each index) is designed to address.

(David Ponessa) #3

We would need hundreds of large nodes to make it a single cluster, and we've seen a lot of instability when going above some numbers for our use case.

(David Turner) #4

Could you describe your use case, and the instability you're seeing, in more detail? It's definitely possible to build a single cluster containing hundreds of large nodes. If you're hitting a bug at your scale then we'd be keen to fix it.

(David Ponessa) #5

We use elasticsearch for security, tracking time based events, coming from custom apps and logs. We currently have to deal with about 300Mb/s of data, but we need to scale that up by a lot. We've been running a 200 node cluster on ES 5.6.10, but it has been unstable, and seems to be tied to the cluster state becoming quite large. Based on projection we will need to reach Petabyte scale storage by mid year, the equivalent of a 400 node cluster, and we don't want to risk instabilities such as we've seen, as we still need to consider the ability to keep scaling and maybe breaking down clusters, when cross cluster search already proved to work very well.

(David Turner) #6

I would like to know more details about this. What do you mean by "unstable"?

There have been substantial improvements in some scaling and resilience issues through the 6.x series and the way that cluster state is managed changes quite a bit in 7.x. It'd be good to know whether your issues are something that we're already aware of and perhaps have already addressed.

To answer your original question, IMO both of your suggestions seem reasonably likely to work. They differ from a whole-cluster resilience standpoint as you note: in the round-robin setup if you lose a cluster then you lose some arbitrary subset of your data whereas in the other setup you will still have access to some whole datasets while others will be completely offline. It totally depends on your business goals which of those is preferable. Neither sounds like it will obviously perform better or worse than the other, and both sound similar from a DR perspective. Managing a constellation of clusters can of course be a bit tricky.

(David Ponessa) #7

We did not fully test performance on a single cluster that large to be able to tell if that kind of issue has been addressed, we are however comfortable with our setup going into the future.

Thanks for your answer!