I have issues with my cluster and I want to re-build it. Need advices!

Hello everyone,

I'm currently using a cluster for keeping metadata about some files which are stored in some locations.

There are 3 nodes and Kibana is installed on another machine. Someone else did this configuration before and I cannot see any node configs in the elasticsearch.yml files, so it's just like they are running on default settings.

Yesterday I've found out that 2 shards (actually 6, 2 primary and 4 replicas), are unassigned (reason: CLUSTER_RECOVERED). In the error message it says that the data might be stale or corrupt, so I've tried to re-allocate them with the allocate_stale_primary. No luck. Then I've tried to restart the cluster which now caused more issues. If I start the 3rd node then ES doesn't respond anymore. Kibana says there is a 3000 timeout after that.

So because all of that the index' status is red. There is like 70% of data still there, other is lost in the unassigned shards. I keep getting shard allocation exceptions in the elasticsearch terminal and transport exceptions.

I can't make any snapshot of the index because of these unassigned shards, so whatever I'm doing now it can be risky and I can lose that data too. I can get the data back though, but that would take a few days.

I want to re-do the entire cluster architecture, and assign these 4 machines specific roles. What can you advice me to do now?

Many thanks!

Hi Andrew,
Are all three nodes in the cluster? I'm guessing that only two are present based on what you said " If I start the 3rd node then ES doesn't respond anymore." Post your elasticsearch.yml files from all three nodes, along with your elasticsearch log files.
Something doesn't seem to be configured correctly.

Unfortunately the data is protected and I cannot share it, but the elasticsearch.yml files only have the cluster name set. Also the minimum master nodes set to 2. No nodes roles set. The .yml files are the same. It's almost the default config.

Anyway, I've been wondering now if it is possible to have two clusters and sync the data between them, (I can get more machines for the second one). I can have one for production and one for development. I want to use the one for development also for learning but in time I will use it as a replacement in case the other one fails somewhere.

Is it possible to sync the data between two clusters? And that to be one-directional. Like from production to development only. Development to production to be only manual.

Hi,

You cannot sync automated from prod -> dev and manually from dev -> prod.

Building a new cluster and sync the data is not going to work if your cluster is RED, you need to figure out the cluster is RED and fix that issue.

I would start to make the cluster config more explicit, I mean set cluster name and ip config so you get control over your config. My Elasticsearch.yml looks like this.

cluster.name: clog
node.name: tb-clog-esd1.tb.iss.local
path.data: /opt/elasticdb/data
path.logs: /var/log/elasticsearch
bootstrap.memory_lock: true
#
network.host: ['10.80.3.11', '127.0.0.1']
#
discovery.zen.master_election.ignore_non_master_pings: true
discovery.zen.ping.unicast.hosts: ["10.80.3.10","10.80.3.11","10.80.3.12","10.80.3.13","10.80.3.14","10.80.3.15","10.80.3.16","10.80.3.17","10.80.3.18","10.80.3.19","10.80.3.20","10.80.3.21","10.80.3.22","10.80.3.23","10.80.3.24","10.80.3.25","10.80.3.26","10.80.3.27","10.80.3.28","10.80.3.29","10.80.3.30","10.80.3.4","10.80.3.5","10.80.3.6","10.80.3.7","10.80.3.8","10.80.3.9"]
discovery.zen.minimum_master_nodes: 2
gateway.recover_after_time: 10m
gateway.recover_after_nodes: 12
gateway.expected_data_nodes: 20
node.master: false
node.data: true
node.ingest: true
node.ml: true
#
xpack.http.ssl.verification_mode: certificate
xpack.watcher.index.rest.direct_access: 'true'
xpack.monitoring.enabled: 'true'
xpack.monitoring.exporters:
  clog:
    type: http
    host: ["http://10.80.3.80:9200"]
    auth.username: "elastic"
    auth.password: "xxxxx"
xpack.monitoring.collection.indices: '*'
xpack.monitoring.collection.interval: 30s
# Reporting settings
xpack.notification.email.account:
  standard_account:
    profile: standard
    email_defaults:
      from:email@example.com
    smtp:
      auth: false
      starttls.enable: false
      host: smtp.host
      port: 25
# Transport encryption
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.key: /etc/elasticsearch/certs/tb-clog-esd1.key
xpack.security.transport.ssl.certificate: /etc/elasticsearch/certs/tb-clog-esd1.crt
xpack.security.transport.ssl.certificate_authorities: [ "/etc/elasticsearch/certs/ca.crt" ]
#
# Http client encryption
xpack.security.http.ssl.enabled: false
xpack.security.http.ssl.key:  /etc/elasticsearch/certs/tb-clog-esd1.key
xpack.security.http.ssl.certificate: /etc/elasticsearch/certs/tb-clog-esd1.crt
xpack.security.http.ssl.certificate_authorities: [ "/etc/elasticsearch/certs/ca.crt" ]
#
# Security settings
xpack.security.enabled: true
xpack:
  security:
    authc:
      realms:
        native1:
          type: native
          order: 0
        ldap1:
          type: ldap
          order: 1
          url: "ldap://ldapc.host:489"
          user_search:
            base_dn: "ou=people,dc=boss,dc=host"
            attribute: uid
          group_search:
            base_dn: "ou=groups,dc=boss,dc=host"
          files:
            role_mapping: "/etc/elasticsearch/role_mapping.yml"
          unmapped_groups_as_roles: false
#
http.cors.enabled: true
http.cors.allow-origin: "/.*/"
transport.tcp.compress: true
node.attr.box_type: ssd

That said, without knowing the actual problem and no logs snippets it Is just guessing.

Good luck
Paul.

We really need the log files to help you. Can you sanitize the logs if they contain sensitive information.

I also agree with @ pjanzen, be explicit with your configs and not run assumed defaults.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.