I have issues with my cluster and I want to re-build it. Need advices!

andrewKode · September 5, 2019, 7:51am

Hello everyone,

I'm currently using a cluster for keeping metadata about some files which are stored in some locations.

There are 3 nodes and Kibana is installed on another machine. Someone else did this configuration before and I cannot see any node configs in the elasticsearch.yml files, so it's just like they are running on default settings.

Yesterday I've found out that 2 shards (actually 6, 2 primary and 4 replicas), are unassigned (reason: CLUSTER_RECOVERED). In the error message it says that the data might be stale or corrupt, so I've tried to re-allocate them with the allocate_stale_primary. No luck. Then I've tried to restart the cluster which now caused more issues. If I start the 3rd node then ES doesn't respond anymore. Kibana says there is a 3000 timeout after that.

So because all of that the index' status is red. There is like 70% of data still there, other is lost in the unassigned shards. I keep getting shard allocation exceptions in the elasticsearch terminal and transport exceptions.

I can't make any snapshot of the index because of these unassigned shards, so whatever I'm doing now it can be risky and I can lose that data too. I can get the data back though, but that would take a few days.

I want to re-do the entire cluster architecture, and assign these 4 machines specific roles. What can you advice me to do now?

Many thanks!

Preakness · September 5, 2019, 2:07pm

Hi Andrew,
Are all three nodes in the cluster? I'm guessing that only two are present based on what you said " If I start the 3rd node then ES doesn't respond anymore." Post your elasticsearch.yml files from all three nodes, along with your elasticsearch log files.
Something doesn't seem to be configured correctly.

andrewKode · September 5, 2019, 9:08pm

Unfortunately the data is protected and I cannot share it, but the elasticsearch.yml files only have the cluster name set. Also the minimum master nodes set to 2. No nodes roles set. The .yml files are the same. It's almost the default config.

Anyway, I've been wondering now if it is possible to have two clusters and sync the data between them, (I can get more machines for the second one). I can have one for production and one for development. I want to use the one for development also for learning but in time I will use it as a replacement in case the other one fails somewhere.

Is it possible to sync the data between two clusters? And that to be one-directional. Like from production to development only. Development to production to be only manual.

pjanzen · September 6, 2019, 5:56am

Hi,

You cannot sync automated from prod -> dev and manually from dev -> prod.

Building a new cluster and sync the data is not going to work if your cluster is RED, you need to figure out the cluster is RED and fix that issue.

I would start to make the cluster config more explicit, I mean set cluster name and ip config so you get control over your config. My Elasticsearch.yml looks like this.

cluster.name: clog
node.name: tb-clog-esd1.tb.iss.local
path.data: /opt/elasticdb/data
path.logs: /var/log/elasticsearch
bootstrap.memory_lock: true
#
network.host: ['10.80.3.11', '127.0.0.1']
#
discovery.zen.master_election.ignore_non_master_pings: true
discovery.zen.ping.unicast.hosts: ["10.80.3.10","10.80.3.11","10.80.3.12","10.80.3.13","10.80.3.14","10.80.3.15","10.80.3.16","10.80.3.17","10.80.3.18","10.80.3.19","10.80.3.20","10.80.3.21","10.80.3.22","10.80.3.23","10.80.3.24","10.80.3.25","10.80.3.26","10.80.3.27","10.80.3.28","10.80.3.29","10.80.3.30","10.80.3.4","10.80.3.5","10.80.3.6","10.80.3.7","10.80.3.8","10.80.3.9"]
discovery.zen.minimum_master_nodes: 2
gateway.recover_after_time: 10m
gateway.recover_after_nodes: 12
gateway.expected_data_nodes: 20
node.master: false
node.data: true
node.ingest: true
node.ml: true
#
xpack.http.ssl.verification_mode: certificate
xpack.watcher.index.rest.direct_access: 'true'
xpack.monitoring.enabled: 'true'
xpack.monitoring.exporters:
  clog:
    type: http
    host: ["http://10.80.3.80:9200"]
    auth.username: "elastic"
    auth.password: "xxxxx"
xpack.monitoring.collection.indices: '*'
xpack.monitoring.collection.interval: 30s
# Reporting settings
xpack.notification.email.account:
  standard_account:
    profile: standard
    email_defaults:
      from:email@example.com
    smtp:
      auth: false
      starttls.enable: false
      host: smtp.host
      port: 25
# Transport encryption
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.key: /etc/elasticsearch/certs/tb-clog-esd1.key
xpack.security.transport.ssl.certificate: /etc/elasticsearch/certs/tb-clog-esd1.crt
xpack.security.transport.ssl.certificate_authorities: [ "/etc/elasticsearch/certs/ca.crt" ]
#
# Http client encryption
xpack.security.http.ssl.enabled: false
xpack.security.http.ssl.key:  /etc/elasticsearch/certs/tb-clog-esd1.key
xpack.security.http.ssl.certificate: /etc/elasticsearch/certs/tb-clog-esd1.crt
xpack.security.http.ssl.certificate_authorities: [ "/etc/elasticsearch/certs/ca.crt" ]
#
# Security settings
xpack.security.enabled: true
xpack:
  security:
    authc:
      realms:
        native1:
          type: native
          order: 0
        ldap1:
          type: ldap
          order: 1
          url: "ldap://ldapc.host:489"
          user_search:
            base_dn: "ou=people,dc=boss,dc=host"
            attribute: uid
          group_search:
            base_dn: "ou=groups,dc=boss,dc=host"
          files:
            role_mapping: "/etc/elasticsearch/role_mapping.yml"
          unmapped_groups_as_roles: false
#
http.cors.enabled: true
http.cors.allow-origin: "/.*/"
transport.tcp.compress: true
node.attr.box_type: ssd

That said, without knowing the actual problem and no logs snippets it Is just guessing.

Good luck
Paul.

Preakness · September 6, 2019, 1:25pm

We really need the log files to help you. Can you sanitize the logs if they contain sensitive information.

I also agree with @ pjanzen, be explicit with your configs and not run assumed defaults.

system · October 4, 2019, 1:25pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Problem with ELK Cluster, Unassigned Shards Elasticsearch	2	350	September 27, 2019
Unassigned Shards - issue Elasticsearch	2	187	March 19, 2023
Rebuild Cluster - Unassigned shards Elasticsearch	4	770	December 15, 2016
Elasticsearch unassigned shards Elasticsearch	14	822	August 13, 2020
Shards UNASSIGNED even tho they exist on disk Elasticsearch	3	523	July 6, 2017

I have issues with my cluster and I want to re-build it. Need advices!

Related topics