Possible problems when creating a cluster with many indices


(José de Zárate) #1

we tried setting up a cluster with three nodes a while ago. It didn't hold
up , we figured it was because we had about 1k small indices (5000 items
each)

I think it might be possible to set up a cluster with such a setup.

My first question is: what is that could make a cluster with the default
configuration (5 shards, 1 replica, auto-everything) fails when 1k indices
are in order?
- Could it be that it spends way too much resources in rebalancing the
10k shards that would arise from the standard 5shards+1replica config?
- Could it be that keeping track of so many indices namespaces is too
costly for a master node that already has around 330 data shards to take
care of?

I guess these two situations could be eased by:
- setting one or more non-data nodes, and mark all the data nodes as
"non master eligible" (I don't even know if this is possible)
- setting a 1shard+1replica config for the indices

What I really would like to know is how does the cluster works internally.
at least a few hints. I've read the reference upside down and also the
newly created guide, but unfortunately the cluster chapter are marked as
"TODO".

well, that was it . hope there is some sensible soul out there that throw
some bone at me :slight_smile:

--
uh, oh http://www.youtube.com/watch?v=GMD_T7ICL0o.

http://www.defectivebydesign.org/no-drm-in-html5

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKNaH0WX46SDVY2tv5X7TPzYD3WFuGncTNgNxY3pCkrrFPp7fA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Nik Everett) #2

On Thu, May 1, 2014 at 11:31 AM, José de Zárate jzarate@gmail.com wrote:

we tried setting up a cluster with three nodes a while ago. It didn't hold
up , we figured it was because we had about 1k small indices (5000 items
each)

We have ~1600 indexes. Most of them have a single shard though. Some are
larger and have 20ish shards. Pretty much everything has two replicas
rather then the default of one. We have 16 nodes at the moment.

I've noticed that having more shards causes admin actions to take longer
then I'd like but nothing has gone unstable. When I look at the hot threads
I've never seen admin actions consume cpu though. What kinds of failures
were you seeing?

The normal advice is to have fewer indexes and use routing. We can't do
that because our indexes have different configurations and we heavily use
the suggesters and they draw their candidates from the terms on the shard.
And we can't have suggestions from one source bleeding into another.

I think it might be possible to set up a cluster with such a setup.

My first question is: what is that could make a cluster with the default
configuration (5 shards, 1 replica, auto-everything) fails when 1k indices
are in order?
- Could it be that it spends way too much resources in rebalancing the
10k shards that would arise from the standard 5shards+1replica config?
- Could it be that keeping track of so many indices namespaces is too
costly for a master node that already has around 330 data shards to take
care of?

5 shards is generally too many for a small index. The advice I got was to
target a shard size of 2GB and that has worked pretty well for me. Our
items vary in size pretty wildly so we can't plan per item. We pretty much
take a wild guess on the number of shards and then check it once the index
is built. If it is horribly off we do a no-downtime reindex (
http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/) to
fix it. It costs a ton of cpu to do this but its better in the long run.

I guess these two situations could be eased by:
- setting one or more non-data nodes, and mark all the data nodes as
"non master eligible" (I don't even know if this is possible)
- setting a 1shard+1replica config for the indices

What I really would like to know is how does the cluster works internally.
at least a few hints. I've read the reference upside down and also the
newly created guide, but unfortunately the cluster chapter are marked as
"TODO".

Out of all eligible master nodes one is selected to be the master. When a
state change has to happen a request is forwarded to the master node. I
get a bit hazy from here on out because I haven't read the code. The
upshot is that the master syncs the changes out to the other nodes and the
they all adopt the change reasonably in sync. I doubt it is perfect sync,
but it is pretty quick.

In the normal (non EC2) setup the cluster state is stored on master
eligible nodes in the _state directory which is in
/var/lib/elasticsearch/nodes/0/ for me (deb package). When a node starts
up it reaches out via multicast and unicast to all configured unicast
nodes. If it can find at least minimum_master_nodes of eligible master
nodes and one claims to be the master then it'll join the cluster and sync
state from it. At least, I think these are the rules.

The cluster state itself is a (reasonably) big immutable data structure.
The code that modifies the state first decides how to modify it, then
copies it with the modifications. The state itself is linked to the rest
of the system by a volatile field in InternalClusterService.

Nik

well, that was it . hope there is some sensible soul out there that throw
some bone at me :slight_smile:

--
uh, oh http://www.youtube.com/watch?v=GMD_T7ICL0o.

http://www.defectivebydesign.org/no-drm-in-html5

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKNaH0WX46SDVY2tv5X7TPzYD3WFuGncTNgNxY3pCkrrFPp7fA%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAKNaH0WX46SDVY2tv5X7TPzYD3WFuGncTNgNxY3pCkrrFPp7fA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3%3D1KxP5OzqasWzW5RfA9SsCscQCa5e_pKfdcHKHraWhg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3