Configuration Brain Wobbles


(Christopher Ambler) #1

I have a cluster with six nodes. The nodes are in different data centers,
but I don't think that matters, as the connectivity is beefy and thick. I
have turned multicast off and unicast on. Each node knows about all the
others explicitly. When I bring up a visualization of the cluster using the
"head" plugin, I see them all. This appears to work as it should. My
cluster looks like this:

DEV-02 (development data center)
MESA-01 (mesa data center)
MESA-02 (mesa data center)
MESA-03 (mesa data center)
BUCK-01 (buck data center)
BUCK-02 (buck data center)

I have each node configured for 5 shards.
I have each node set to be master true and data true

I do ALL of my document addition using MESA-01 and I can do queries on ANY
node and get a result, so that's working. But I notice two things and have
one requirement I can't figure out:

  1. Most queries come in sub-30ms. But every now and again I get a query
    that is longer. I set my slow query log to complain over 100ms and I see
    that maybe one query out of 15 or so takes 800ms to 1200ms. This is on any
    node.

  2. I have unassigned shards. I presume this is bad, yes? How do I get them
    to allocate? When I stop and start the service on any of the nodes, the
    shards are shuffled around, but rarely are the unassigned shards put on a
    node. Why? How do I resolve this?

And my requirement - Is there a way to say, "Look, Elasticsearch, I don't
want you shuffling shards around here and there, I'd like EVERY node to
have a COMPLETE replica of the data, and you just keep it up to date. That
way, you see, a query on a buck data center node won't have to ask a mesa
data center for a document if it doesn't have it."

Solving #2 is important, but solving my requirement is somewhat critical. I
think fixing these two things will take care of issue #1.

At least it'll get me configured right so if #1 is still there, I can
diagnose from a position of not wondering if misconfiguration is my problem.

Help?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/33e7db26-ed5e-4c9e-abe5-fd656a73e978%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #2

Standard response to this is ES is not built for multi DC clustering, but
as long as you are aware you are of that then it's fine.

Have you looked at
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-allocation.html
?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 31 July 2014 08:31, Christopher Ambler chris@insiderhouse.com wrote:

I have a cluster with six nodes. The nodes are in different data centers,
but I don't think that matters, as the connectivity is beefy and thick. I
have turned multicast off and unicast on. Each node knows about all the
others explicitly. When I bring up a visualization of the cluster using
the "head" plugin, I see them all. This appears to work as it should. My
cluster looks like this:

DEV-02 (development data center)
MESA-01 (mesa data center)
MESA-02 (mesa data center)
MESA-03 (mesa data center)
BUCK-01 (buck data center)
BUCK-02 (buck data center)

I have each node configured for 5 shards.
I have each node set to be master true and data true

I do ALL of my document addition using MESA-01 and I can do queries on ANY
node and get a result, so that's working. But I notice two things and have
one requirement I can't figure out:

  1. Most queries come in sub-30ms. But every now and again I get a query
    that is longer. I set my slow query log to complain over 100ms and I see
    that maybe one query out of 15 or so takes 800ms to 1200ms. This is on any
    node.

  2. I have unassigned shards. I presume this is bad, yes? How do I get them
    to allocate? When I stop and start the service on any of the nodes, the
    shards are shuffled around, but rarely are the unassigned shards put on a
    node. Why? How do I resolve this?

And my requirement - Is there a way to say, "Look, Elasticsearch, I don't
want you shuffling shards around here and there, I'd like EVERY node to
have a COMPLETE replica of the data, and you just keep it up to date. That
way, you see, a query on a buck data center node won't have to ask a mesa
data center for a document if it doesn't have it."

Solving #2 is important, but solving my requirement is somewhat critical.
I think fixing these two things will take care of issue #1.

At least it'll get me configured right so if #1 is still there, I can
diagnose from a position of not wondering if misconfiguration is my problem.

Help?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/33e7db26-ed5e-4c9e-abe5-fd656a73e978%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/33e7db26-ed5e-4c9e-abe5-fd656a73e978%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624ZRF-uK7-dhuOSVqNwyQ13t6E3fGWkc4PpjvP_e2F9Z4A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Christopher Ambler) #3

Yes, I'm aware of the multi-DC issue :wink:

So yeah, this morning I dove into index shard allocation, and did just that.

3 zones (using node.zone as my tag) and then I set things to have 1 primary
and 2 replications and then set it such that each zone should play along
with this.

This worked as advertised. One zone gets the primary and the other two
zones get the two replications. Each zone as 2 nodes and I have 5 shards,
so one node gets two and the other gets three.

Head shows me this and it all makes sense.

We're now doing about 30 searches per second, and I'm still seeing, about
every 5 or 6 seconds, a single "slow query" in the 600ms to 900ms range.
All other queries are sub-50ms.

I need to find out why I'm seeing these consistent 600ms+ queries and
eliminate them if I can.

Everything else looks good.

Oh, and on the unallocated shards issue, I tracked that down to having had
more replications when I built the index and then taking my replications
down. I had shards that had nowhere to go. So I just removed those indexes
(they were old) and everything is green.

So the consistent 600ms slow queries is my only issue now.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/99286c26-9816-4174-8358-a44438b448aa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #4