Installing xpack = yellow cluster health

After installing xpack and changing the default passwords of the three default users, my cluster now shows as yellow when I query it's health:

{
  "cluster_name" : "graylog",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 10,
  "active_shards" : 10,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 5,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 66.66666666666666
}

I've done some searching (I'm an ES novice) and the issue is that I have unassigned shards.

I found the following query which shows that the following shards are the unassigned shards:

curl -XGET localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED
.monitoring-alerts-6          0 r UNASSIGNED CLUSTER_RECOVERED
.watcher-history-6-2017.10.19 0 r UNASSIGNED CLUSTER_RECOVERED
.monitoring-es-6-2017.10.19   0 r UNASSIGNED CLUSTER_RECOVERED
.triggered_watches            0 r UNASSIGNED CLUSTER_RECOVERED
.watches  

How exactly do I repair my cluster and why would installing xpack result in the above?

That is expected as you have only one node. So it can't replicate shards.

It's perfectly fine to stay in that situation.
Or you can change live the number of replicas to 0. So no backup expected and elasticsearch will be green again.

PUT /.monitoring-alerts-6/_settings
{
    "index" : {
        "number_of_replicas" : 0
    }
}

You probably will have to change also some index templates .

Thanks for the quick reply.

Can I expect this behavior upon restart of any single node cluster?

I ran the following query and even though the shards show as unassigned it looks like new ones were created:

curl -XGET "http://127.0.0.1:9200/_cat/shards"
.security                     0 p STARTED      127.0.0.1 test7643
.monitoring-alerts-6          0 p STARTED      127.0.0.1 test7643
.monitoring-alerts-6          0 r UNASSIGNED             
.watcher-history-6-2017.10.19 0 p STARTED      127.0.0.1 test7643
.watcher-history-6-2017.10.19 0 r UNASSIGNED             
.monitoring-es-6-2017.10.19   0 p STARTED      127.0.0.1 test7643
.monitoring-es-6-2017.10.19   0 r UNASSIGNED             
graylog_0                     1 p STARTED      127.0.0.1 test7643
graylog_0                     2 p STARTED      127.0.0.1 test7643
graylog_0                     3 p STARTED      127.0.0.1 test7643
graylog_0                     0 p STARTED      127.0.0.1 test7643
.triggered_watches            0 p STARTED      127.0.0.1 test7643
.triggered_watches            0 r UNASSIGNED             
.watches                      0 p STARTED      127.0.0.1 test7643
.watches                      0 r UNASSIGNED

add a data node or in cluster env to test this.

I just installed xpack on my three node cluster and I'm seeing the same results:

watches 0 r UNASSIGNED CLUSTER_RECOVERED
.watcher-history-3-2017.10.24 0 r UNASSIGNED CLUSTER_RECOVERED
.monitoring-alerts-6 0 r UNASSIGNED CLUSTER_RECOVERED
.triggered_watches 0 r UNASSIGNED CLUSTER_RECOVERED
.monitoring-es-6-2017.10.24 0 r UNASSIGNED CLUSTER_RECOVERED

I'm seeing this behavior on my three node cluster as well.

From what I have read today it sounds like it's creating the replica on the same server as the primary shard.

Is this a bug with xpack or did I forget to configure something correctly?

I also noticed that it's creating a watcher, monitoring, and watcher history index for each day. I confirmed that these are created after installing xpack. Will it eventually delete these daily indexes or do I have to do it manually?

Paste the result of

GET _cat/nodes?v
GET _cat/shards?v
-XGET 'http://localhost:9200/_cat/nodes?v'
ip          heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
10.66.8.189            9          73   0    0.05    0.05     0.10 mi        *      grayloges1
10.66.8.202           15          87   2    0.24    0.28     0.39 di        -      grayloges3
10.66.8.191            6          70   0    0.00    0.01     0.05 mi        -      grayloges2

-XGET 'http://localhost:9200/_cat/shards?v'|grep '^\.'
.watches                      0     p      STARTED       4  35.9kb 10.66.8.202 grayloges3
.watcher-history-3-2017.10.24 0     p      STARTED    1501   1.2mb 10.66.8.202 grayloges3
.security                     0     p      STARTED       3   8.2kb 10.66.8.202 grayloges3
.triggered_watches            0     p      STARTED       0    191b 10.66.8.202 grayloges3
.monitoring-alerts-6          0     p      STARTED       1   6.6kb 10.66.8.202 grayloges3
.monitoring-es-6-2017.10.24   0     p      STARTED   64769  50.5mb 10.66.8.202 grayloges3

NOTE: There was a replica shard for each monitoring-es-6, watcher-history, but I modified the primary shard to have zero replicas.

After reading more about templates for indexes, I now understand what you meant by modifying the template. I'm attempting to modify the template for each of the above mentioned indexes but I'm having some issues in my POST. I believe I have an error in my JSON data.

However, before I go an modify the templates, I just want to know why the default is to create a replica index on the same node?

That should not happen.
Why do you believe this is happening in your cluster? (That is, What evidence do you have that this occurred?)

This happens because setting up X-Pack causes new indices to be created that are configured to use replicas (as they should), and a single node cluster has nowhere to assign (place) the replicas.
Often when performing a new installation there was no data in the cluster before X-Pack was installed, so there were no unassigned shards, so the cluster was green.
As soon as there is data, the cluster goes yellow because the replicas cannot be assigned, it just happens that X-Pack is the first thing to write data into the cluster, so it is the trigger for the "yellow" state.
But it would have been yellow as soon as you stored a document in any index even without X-Pack.

Here is a shard listing of node3 of my cluster after installing xpack and before I set replicas to 0 for the created shards:

.watcher-history-3-2017.10.24 0     p      STARTED        700 634.6kb 10.66.8.202 grayloges3
.watcher-history-3-2017.10.24 0     r      UNASSIGNED                             
.monitoring-alerts-6          0     p      STARTED          1  12.4kb 10.66.8.202 grayloges3
.monitoring-alerts-6          0     r      UNASSIGNED                             
.triggered_watches            0     p      STARTED          0  28.3kb 10.66.8.202 grayloges3
.triggered_watches            0     r      UNASSIGNED                             
.monitoring-es-6-2017.10.24   0     p      STARTED      28396  22.5mb 10.66.8.202 grayloges3
.monitoring-es-6-2017.10.24   0     r      UNASSIGNED                             
.watches                      0     p      STARTED          4  55.8kb 10.66.8.202 grayloges3
.watches                      0     r      UNASSIGNED                             

As you can see the replica shards are marked unassigned.

In my case that is not true....Graylog has been writing data to ES for weeks now so it has been successfully creating indexes on all three nodes. I found out about xpack last week so now I'm attempting to secure all my ES instances. I agree that on single node instances of ES the replica will always show as unassigned, but I'm assuming that on my three node cluster the replica indexes should be assigned to another node in the cluster.

There seems to be some confusion here. I'm not sure whether I'm misunderstanding you, or you are misunderstanding the API output, or a bit of both, but we're definitely not on the same page.

I'm going to wind it back to the beginning and see if we can get a shared understanding.

In you original post you showed the cluster heath with (trimmed for clarity) :

  "status" : "yellow",
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_shards" : 10,
  "unassigned_shards" : 5,

In this case your cluster was yellow because it had unassigned shards.
The most likely explanation for that is that it is your indices are configured to use replica shards (a good thing) but you have only 1 data node, so those replicas have nowhere to go.

Looking at your shards (trimmed for brevity)

.monitoring-alerts-6          0 r UNASSIGNED CLUSTER_RECOVERED

That's

  • .monitoring-alerts-6 index,
  • shard 0
  • replica
  • unassigned
  • it has been unassigned since the cluster recovered.

All your unassigned shards appear to be replicas (you didn't paste the details for .watches, but we can guess), which matches the theory above.

I'm confused by what you mean in your second post.

I ran the following query and even though the shards show as unassigned it looks like new ones were created

Looking at your shards output we have

  • .security index, shard 0, primary, started on node "test7643"
  • .monitoring-alerts-6 index, shard 0, primary, started on node "test7643"
  • .monitoring-alerts-6 index, shard 0, replica, unassigned
  • skip a couple of indices with the same pattern of single shard, with primary & replica
  • graylog_0 index, shard 1, primary, started on node "test7643"
  • graylog_0 index, shard 2, primary, started on node "test7643"
  • graylog_0 index, shard 3, primary, started on node "test7643"
  • graylog_0 index, shard 0, primary, started on node "test7643"
  • skip a couple of indices with the pattern of single shard, with primary & replica

The important thing here is:

  • The .security index has no replicas. That's because it is (by default) configured to have as many copies as there are nodes in your cluster, and you have 1 node so it has 1 primary, no replicas.
  • The other x-pack related indices have 1 primary, 1 replica. But the replica is unassigned because you only have 1 node.
  • graylog_0 has 4 shards, but no replicas. I'm not familiar with the underlying shard strategy that graylog uses, so I don't know why that is the case. It seems strange, but there may be a good reason.

Your 3rd and 4th post say:

I just installed xpack on my three node cluster and I'm seeing the same results

I'm seeing this behavior on my three node cluster as well.

I'll explain why when we get to looking at the output of _cat/nodes.

You also say:

From what I have read today it sounds like it's creating the replica on the same server as the primary shard.

and

I just want to know why the default is to create a replica index on the same node?

I really don't understand why you say this.
Everytime you post shard output, it shows that your replicas are UNASSIGNED, but you claim that ES is creating replias on the same server as the primary.
UNASSIGNED means that the replica doesn't exist on any server. The _cat/shards API is showing the shards for the whole cluster, not just for a single node. And ES is explicitly not assigning those replicas to any node because it won't assign them to the same node as the primary, and it has nowhere else to assign them.
ES is doing the exact opposite of what you say it is doing.

Now, as to why your 3 node cluster has unassigned replicas.
Your _cat/nodes output is (columns trimmed for clarity)

ip              node.role master name
10.66.8.189     mi        *      grayloges1
10.66.8.202     di        -      grayloges3
10.66.8.191     mi        -      grayloges2

This is a very perculiar node setup.
You have 3 nodes, but only 2 of them are master eligible, and only 1 is a data node.

  • The first line, node "grayloges1", has role mi. That means "master eligible" and "ingest". It is the master node
  • The second line, node "grayloges3", has role di. That means "data" and "ingest"
  • The third line, node "grayloges2", has role mi. That means "master eligible" and "ingest".

That's a problem for 2 reasons.

(1) Only "grayloges3" is actually storing data.
That is why your replicas are unassigned - from a data point of view this is acting like a 1 node cluster.

(2) You have exactly 2 master-eligible nodes.
That's not helpful. ES uses a quorum model for electing the active master node. Ideally, you should aim to have an odd number of master-eligible nodes so the majority is ceiling(master-nodes / 2).
You should definitely not have 2 master nodes, because that just makes things worse.

  • If you have 1 master node, then your cluster will work fine whenever that nodes is available, and be down whenever that node is unavailable.
  • If you have 2 master-eligible nodes, then either:
    • You run the risk of having "split brain" where both master-eligible nodes think they are the master and try to take control of the cluster (which can be prevented by setting discovery.zen.minimum_master_nodes to 2)
    • But if you set min-master to 2, you require both master-eligible nodes to be online in order for your cluster to function, which makes your cluster more fragile than just having 1 master eligible node.

In your case, I think you want:

  • every node to be a data node. In the elasticsearch.yml file, set node.data to true on every server.
  • every node to be master-elgibile node. In the elasticsearch.yml file, set node.master to true on every server.
  • A quorum of 2 (out of 3 nodes). In the elasticsearch.yml file, set discovery.zen.minimum_master_nodes to 2 on every server.

There's more explanation here: Node | Elasticsearch Reference [5.6] | Elastic

Graylog has been writing data to ES for weeks now so it has been successfully creating indexes on all three nodes.

I don't believe that to be the case.
It hasn't sent your cluster to yellow because it's not creating any replicas, but all the evidence suggests that it is only creating indices (technically "shards") on your grayloges3 data node.

Ok, I misunderstood the output. I thought the output was just for each node within the cluster but the output being for the entire cluster makes more sense to me now.

I didn't perform the initial cluster setup myself, but I should have checked to make sure we set it up correctly before having graylog forward data to it.

I will reconfigure it based on your advice. It sounds like my problems will be resolved once the cluster is configured correctly.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.