There seems to be some confusion here. I'm not sure whether I'm misunderstanding you, or you are misunderstanding the API output, or a bit of both, but we're definitely not on the same page.
I'm going to wind it back to the beginning and see if we can get a shared understanding.
In you original post you showed the cluster heath with (trimmed for clarity) :
"status" : "yellow",
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_shards" : 10,
"unassigned_shards" : 5,
In this case your cluster was yellow because it had unassigned shards.
The most likely explanation for that is that it is your indices are configured to use replica shards (a good thing) but you have only 1 data node, so those replicas have nowhere to go.
Looking at your shards (trimmed for brevity)
.monitoring-alerts-6 0 r UNASSIGNED CLUSTER_RECOVERED
That's
.monitoring-alerts-6
index,
- shard
0
- replica
- unassigned
- it has been unassigned since the cluster recovered.
All your unassigned shards appear to be replicas (you didn't paste the details for .watches
, but we can guess), which matches the theory above.
I'm confused by what you mean in your second post.
I ran the following query and even though the shards show as unassigned it looks like new ones were created
Looking at your shards output we have
.security
index, shard 0, primary, started on node "test7643"
.monitoring-alerts-6
index, shard 0, primary, started on node "test7643"
.monitoring-alerts-6
index, shard 0, replica, unassigned
- skip a couple of indices with the same pattern of single shard, with primary & replica
graylog_0
index, shard 1, primary, started on node "test7643"
graylog_0
index, shard 2, primary, started on node "test7643"
graylog_0
index, shard 3, primary, started on node "test7643"
graylog_0
index, shard 0, primary, started on node "test7643"
- skip a couple of indices with the pattern of single shard, with primary & replica
The important thing here is:
- The
.security
index has no replicas. That's because it is (by default) configured to have as many copies as there are nodes in your cluster, and you have 1 node so it has 1 primary, no replicas.
- The other x-pack related indices have 1 primary, 1 replica. But the replica is unassigned because you only have 1 node.
graylog_0
has 4 shards, but no replicas. I'm not familiar with the underlying shard strategy that graylog uses, so I don't know why that is the case. It seems strange, but there may be a good reason.
Your 3rd and 4th post say:
I just installed xpack on my three node cluster and I'm seeing the same results
I'm seeing this behavior on my three node cluster as well.
I'll explain why when we get to looking at the output of _cat/nodes
.
You also say:
From what I have read today it sounds like it's creating the replica on the same server as the primary shard.
and
I just want to know why the default is to create a replica index on the same node?
I really don't understand why you say this.
Everytime you post shard output, it shows that your replicas are UNASSIGNED
, but you claim that ES is creating replias on the same server as the primary.
UNASSIGNED means that the replica doesn't exist on any server. The _cat/shards
API is showing the shards for the whole cluster, not just for a single node. And ES is explicitly not assigning those replicas to any node because it won't assign them to the same node as the primary, and it has nowhere else to assign them.
ES is doing the exact opposite of what you say it is doing.
Now, as to why your 3 node cluster has unassigned replicas.
Your _cat/nodes
output is (columns trimmed for clarity)
ip node.role master name
10.66.8.189 mi * grayloges1
10.66.8.202 di - grayloges3
10.66.8.191 mi - grayloges2
This is a very perculiar node setup.
You have 3 nodes, but only 2 of them are master eligible, and only 1 is a data node.
- The first line, node "grayloges1", has role
mi
. That means "master eligible" and "ingest". It is the master node
- The second line, node "grayloges3", has role
di
. That means "data" and "ingest"
- The third line, node "grayloges2", has role
mi
. That means "master eligible" and "ingest".
That's a problem for 2 reasons.
(1) Only "grayloges3" is actually storing data.
That is why your replicas are unassigned - from a data point of view this is acting like a 1 node cluster.
(2) You have exactly 2 master-eligible nodes.
That's not helpful. ES uses a quorum model for electing the active master node. Ideally, you should aim to have an odd number of master-eligible nodes so the majority is ceiling(master-nodes / 2)
.
You should definitely not have 2 master nodes, because that just makes things worse.
- If you have 1 master node, then your cluster will work fine whenever that nodes is available, and be down whenever that node is unavailable.
- If you have 2 master-eligible nodes, then either:
- You run the risk of having "split brain" where both master-eligible nodes think they are the master and try to take control of the cluster (which can be prevented by setting
discovery.zen.minimum_master_nodes
to 2
)
- But if you set min-master to 2, you require both master-eligible nodes to be online in order for your cluster to function, which makes your cluster more fragile than just having 1 master eligible node.
In your case, I think you want:
- every node to be a data node. In the
elasticsearch.yml
file, set node.data
to true
on every server.
- every node to be master-elgibile node. In the
elasticsearch.yml
file, set node.master
to true
on every server.
- A quorum of 2 (out of 3 nodes). In the
elasticsearch.yml
file, set discovery.zen.minimum_master_nodes
to 2
on every server.
There's more explanation here: Node | Elasticsearch Reference [5.6] | Elastic
Graylog has been writing data to ES for weeks now so it has been successfully creating indexes on all three nodes.
I don't believe that to be the case.
It hasn't sent your cluster to yellow because it's not creating any replicas, but all the evidence suggests that it is only creating indices (technically "shards") on your grayloges3
data node.