Elasticsearch upgrade from 7 to 8

Hi All,

Our current cluster is running on elasticsearch version 7.17.3 and we are planning to upgrade to version 8.15.2 (hopefully an allowed version)

We want to adopt a "rolling upgrade" doing one instance at a time. We have ran the Upgrade Assistant and fixed all reported issues. A full snapshot has also been taken on local NAS.

On checking the page Upgrade Elasticsearch | Elastic Installation and Upgrade Guide [8.15] | Elastic we presume that the following steps are now needed to do the upgrade:

  1. Disable shard allocation.

  2. Stop non-essential indexing and perform a flush. (Optional)

  3. Temporarily stop the tasks associated with active machine learning jobs and datafeeds. (Optional and we dont have any)

  4. Shut down a single node.

  5. Upgrade the node you shut down.

In Step 5 above we have a compressed tar.gz file for elasticsearch-8.15.2 which we have untarred/ unzipped under a specific directory (parallel to the ver 7.17.3)

My question relates to the next 3 steps referring to "config", "data" and "logs" directory. In our case "config", "data" and “logs” point to external directories and the embedded ones are not being used.

It seems that all we need to do is to point the below symlink "current" to the new install as follows and bring up the node. Our start script points to elasticsearch version corresponding to "current" and also external “config” and “data” directories are referenced:

[tvportal@ad-ccf-ddfg ]$ cd /opt/tvportal/elasticsearch/
[tvportal@sd-afb7-1f0c elasticsearch]$ ls -trl
total 16
lrwxrwxrwx. 1 tvportal tvportal 58 Oct 31 2022 current -> /opt/tvportal/elasticsearch/elasticsearch-8.15.2
drwxr-xr-x. 9 tvportal tvportal 4096 Sep 19 2024 elasticsearch-8.15.2
drwxr-xr-x. 9 tvportal tvportal 4096 Dec 12 11:21 elasticsearch-7.17.3

Please guide if the understanding above is correct as we are doing this for the first time.

8.15.2, released Sep 26, 2024, is a curious choice on Feb 3, 2026 ! Anyways ...

Your start script does what it does, and if you were to share it, we can maybe look at it too.

You don't say anything about your cluster - how many nodes, what sort of nodes, .how many indices, how much data, .. ? Is there any chance there are indices created by 6.x in the cluster ?

I am curious as to what these issues were.

If it were me, I'd try to setup a small test cluster with 7.17.3 on it, likely on VMs, with as much in common with the prod cluster as I could configure, snapshot too create a baseline, and do same upgrade process there a few times to practice.

Thanks for your response.

We do have a test environment where we will upgrade first before going to Production. There is no version 6 index existing in the cluster. Please see cluster stats below:

TEST

{
  "_nodes" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "cluster_name" : "tvdig-test",
  "cluster_uuid" : "hHRyGybRQoyzcoKbxjez0Q",
  "timestamp" : 1770390516889,
  "status" : "green",
  "indices" : {
    "count" : 989,
    "shards" : {
      "total" : 1978,
      "primaries" : 989,
      "replication" : 1.0,
      "index" : {
        "shards" : {
          "min" : 2,
          "max" : 2,
          "avg" : 2.0
        },
        "primaries" : {
          "min" : 1,
          "max" : 1,
          "avg" : 1.0
        },
        "replication" : {
          "min" : 1.0,
          "max" : 1.0,
          "avg" : 1.0
        }
      }
    },
    "docs" : {
      "count" : 604839168,
      "deleted" : 15194159
    },
    "store" : {
      "size_in_bytes" : 572656597625,
      "total_data_set_size_in_bytes" : 572656597625,
      "reserved_in_bytes" : 0
    },
    "fielddata" : {
      "memory_size_in_bytes" : 153952,
      "evictions" : 0
    },
    "query_cache" : {
      "memory_size_in_bytes" : 11734423,
      "total_count" : 24912160,
      "hit_count" : 976393,
      "miss_count" : 23935767,
      "cache_size" : 957,
      "cache_count" : 34613,
      "evictions" : 33656
    },
    "completion" : {
      "size_in_bytes" : 0
    },
    "segments" : {
      "count" : 11692,
      "memory_in_bytes" : 153182998,
      "terms_memory_in_bytes" : 98888320,
      "stored_fields_memory_in_bytes" : 8146256,
      "term_vectors_memory_in_bytes" : 0,
      "norms_memory_in_bytes" : 8213824,
      "points_memory_in_bytes" : 0,
      "doc_values_memory_in_bytes" : 37934598,
      "index_writer_memory_in_bytes" : 1298793784,
      "version_map_memory_in_bytes" : 20956166,
      "fixed_bit_set_memory_in_bytes" : 21226240,
      "max_unsafe_auto_id_timestamp" : 1770336009237,
      "file_sizes" : { }
    },
    "mappings" : {
      "field_types" : [
        {
          "name" : "boolean",
          "count" : 454,
          "index_count" : 377,
          "script_count" : 0
        },
        {
          "name" : "constant_keyword",
          "count" : 6,
          "index_count" : 2,
          "script_count" : 0
        },
        {
          "name" : "date",
          "count" : 654,
          "index_count" : 316,
          "script_count" : 0
        },
        {
          "name" : "double",
          "count" : 1372,
          "index_count" : 72,
          "script_count" : 0
        },
        {
          "name" : "float",
          "count" : 5038,
          "index_count" : 561,
          "script_count" : 0
        },
        {
          "name" : "half_float",
          "count" : 41,
          "index_count" : 11,
          "script_count" : 0
        },
        {
          "name" : "integer",
          "count" : 662,
          "index_count" : 72,
          "script_count" : 0
        },
        {
          "name" : "ip",
          "count" : 2,
          "index_count" : 2,
          "script_count" : 0
        },
        {
          "name" : "keyword",
          "count" : 37000,
          "index_count" : 974,
          "script_count" : 0
        },
        {
          "name" : "long",
          "count" : 3925,
          "index_count" : 846,
          "script_count" : 0
        },
        {
          "name" : "nested",
          "count" : 84,
          "index_count" : 26,
          "script_count" : 0
        },
        {
          "name" : "object",
          "count" : 1578,
          "index_count" : 383,
          "script_count" : 0
        },
        {
          "name" : "short",
          "count" : 8,
          "index_count" : 8,
          "script_count" : 0
        },
        {
          "name" : "text",
          "count" : 13807,
          "index_count" : 631,
          "script_count" : 0
        },
        {
          "name" : "version",
          "count" : 4,
          "index_count" : 4,
          "script_count" : 0
        }
      ],
      "runtime_field_types" : [ ]
    },
    "analysis" : {
      "char_filter_types" : [ ],
      "tokenizer_types" : [
        {
          "name" : "edge_ngram",
          "count" : 3,
          "index_count" : 3
        }
      ],
      "filter_types" : [
        {
          "name" : "edge_ngram",
          "count" : 2,
          "index_count" : 2
        }
      ],
      "analyzer_types" : [
        {
          "name" : "custom",
          "count" : 12,
          "index_count" : 7
        },
        {
          "name" : "standard",
          "count" : 3,
          "index_count" : 3
        }
      ],
      "built_in_char_filters" : [ ],
      "built_in_tokenizers" : [
        {
          "name" : "keyword",
          "count" : 2,
          "index_count" : 2
        },
        {
          "name" : "standard",
          "count" : 7,
          "index_count" : 5
        }
      ],
      "built_in_filters" : [
        {
          "name" : "lowercase",
          "count" : 6,
          "index_count" : 4
        }
      ],
      "built_in_analyzers" : [ ]
    },
    "versions" : [
      {
        "version" : "7.3.2",
        "index_count" : 53,
        "primary_shard_count" : 53,
        "total_primary_bytes" : 2551944058
      },
      {
        "version" : "7.17.3",
        "index_count" : 936,
        "primary_shard_count" : 936,
        "total_primary_bytes" : 286096342856
      }
    ]
  },
  "nodes" : {
    "count" : {
      "total" : 5,
      "coordinating_only" : 0,
      "data" : 5,
      "data_cold" : 3,
      "data_content" : 3,
      "data_frozen" : 3,
      "data_hot" : 3,
      "data_warm" : 3,
      "ingest" : 5,
      "master" : 3,
      "ml" : 3,
      "remote_cluster_client" : 3,
      "transform" : 5,
      "voting_only" : 0
    },
    "versions" : [
      "7.17.3"
    ],
    "os" : {
      "available_processors" : 40,
      "allocated_processors" : 40,
      "names" : [
        {
          "name" : "Linux",
          "count" : 5
        }
      ],
      "pretty_names" : [
        {
          "pretty_name" : "Red Hat Enterprise Linux 8.10 (Ootpa)",
          "count" : 5
        }
      ],
      "architectures" : [
        {
          "arch" : "amd64",
          "count" : 5
        }
      ],
      "mem" : {
        "total_in_bytes" : 673186205696,
        "free_in_bytes" : 232055848960,
        "used_in_bytes" : 441130356736,
        "free_percent" : 34,
        "used_percent" : 66
      }
    },
    "process" : {
      "cpu" : {
        "percent" : 12
      },
      "open_file_descriptors" : {
        "min" : 2843,
        "max" : 3170,
        "avg" : 2926
      }
    },
    "jvm" : {
      "max_uptime_in_millis" : 467608306,
      "versions" : [
        {
          "version" : "1.8.0_371",
          "vm_name" : "Java HotSpot(TM) 64-Bit Server VM",
          "vm_version" : "25.371-b11",
          "vm_vendor" : "Oracle Corporation",
          "bundled_jdk" : true,
          "using_bundled_jdk" : false,
          "count" : 3
        },
        {
          "version" : "1.8.0_331",
          "vm_name" : "Java HotSpot(TM) 64-Bit Server VM",
          "vm_version" : "25.331-b09",
          "vm_vendor" : "Oracle Corporation",
          "bundled_jdk" : true,
          "using_bundled_jdk" : false,
          "count" : 2
        }
      ],
      "mem" : {
        "heap_used_in_bytes" : 19836033920,
        "heap_max_in_bytes" : 42601021440
      },
      "threads" : 967
    },
    "fs" : {
      "total_in_bytes" : 2955403788288,
      "free_in_bytes" : 2211405819904,
      "available_in_bytes" : 2071738298368
    },
    "plugins" : [ ],
    "network_types" : {
      "transport_types" : {
        "security4" : 5
      },
      "http_types" : {
        "security4" : 5
      }
    },
    "discovery_types" : {
      "zen" : 5
    },
    "packaging_types" : [
      {
        "flavor" : "default",
        "type" : "tar",
        "count" : 5
      }
    ],
    "ingest" : {
      "number_of_pipelines" : 8,
      "processor_stats" : {
        "conditional" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 0
        },
        "gsub" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 0
        },
        "remove" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 0
        },
        "script" : {
          "count" : 1798,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 202
        },
        "set" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 0
        },
        "set_security_user" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 0
        }
      }
    }
  }
}

PROD

{
  "_nodes" : {
    "total" : 10,
    "successful" : 10,
    "failed" : 0
  },
  "cluster_name" : "tvdig-prod",
  "cluster_uuid" : "FD1TuVbeRPqnXKM3TUgJug",
  "timestamp" : 1770390606455,
  "status" : "green",
  "indices" : {
    "count" : 4300,
    "shards" : {
      "total" : 8600,
      "primaries" : 4300,
      "replication" : 1.0,
      "index" : {
        "shards" : {
          "min" : 2,
          "max" : 2,
          "avg" : 2.0
        },
        "primaries" : {
          "min" : 1,
          "max" : 1,
          "avg" : 1.0
        },
        "replication" : {
          "min" : 1.0,
          "max" : 1.0,
          "avg" : 1.0
        }
      }
    },
    "docs" : {
      "count" : 5057155485,
      "deleted" : 65082833
    },
    "store" : {
      "size_in_bytes" : 3366989494006,
      "total_data_set_size_in_bytes" : 3366989494006,
      "reserved_in_bytes" : 0
    },
    "fielddata" : {
      "memory_size_in_bytes" : 4647176,
      "evictions" : 0
    },
    "query_cache" : {
      "memory_size_in_bytes" : 116326220,
      "total_count" : 57389271,
      "hit_count" : 2233704,
      "miss_count" : 55155567,
      "cache_size" : 11393,
      "cache_count" : 59629,
      "evictions" : 48236
    },
    "completion" : {
      "size_in_bytes" : 0
    },
    "segments" : {
      "count" : 54200,
      "memory_in_bytes" : 418051004,
      "terms_memory_in_bytes" : 252362904,
      "stored_fields_memory_in_bytes" : 49639904,
      "term_vectors_memory_in_bytes" : 0,
      "norms_memory_in_bytes" : 14525120,
      "points_memory_in_bytes" : 0,
      "doc_values_memory_in_bytes" : 101523076,
      "index_writer_memory_in_bytes" : 964266368,
      "version_map_memory_in_bytes" : 81068238,
      "fixed_bit_set_memory_in_bytes" : 666212864,
      "max_unsafe_auto_id_timestamp" : 1770357922434,
      "file_sizes" : { }
    },
    "mappings" : {
      "field_types" : [
        {
          "name" : "alias",
          "count" : 1581,
          "index_count" : 17,
          "script_count" : 0
        },
        {
          "name" : "boolean",
          "count" : 3380,
          "index_count" : 2212,
          "script_count" : 0
        },
        {
          "name" : "constant_keyword",
          "count" : 6,
          "index_count" : 2,
          "script_count" : 0
        },
        {
          "name" : "date",
          "count" : 2963,
          "index_count" : 1880,
          "script_count" : 0
        },
        {
          "name" : "double",
          "count" : 2176,
          "index_count" : 127,
          "script_count" : 0
        },
        {
          "name" : "flattened",
          "count" : 136,
          "index_count" : 17,
          "script_count" : 0
        },
        {
          "name" : "float",
          "count" : 10744,
          "index_count" : 1300,
          "script_count" : 0
        },
        {
          "name" : "geo_point",
          "count" : 136,
          "index_count" : 17,
          "script_count" : 0
        },
        {
          "name" : "half_float",
          "count" : 40,
          "index_count" : 10,
          "script_count" : 0
        },
        {
          "name" : "integer",
          "count" : 796,
          "index_count" : 98,
          "script_count" : 0
        },
        {
          "name" : "ip",
          "count" : 240,
          "index_count" : 19,
          "script_count" : 0
        },
        {
          "name" : "keyword",
          "count" : 130450,
          "index_count" : 4281,
          "script_count" : 0
        },
        {
          "name" : "long",
          "count" : 18613,
          "index_count" : 2853,
          "script_count" : 0
        },
        {
          "name" : "nested",
          "count" : 272,
          "index_count" : 55,
          "script_count" : 0
        },
        {
          "name" : "object",
          "count" : 15359,
          "index_count" : 2104,
          "script_count" : 0
        },
        {
          "name" : "scaled_float",
          "count" : 17,
          "index_count" : 17,
          "script_count" : 0
        },
        {
          "name" : "short",
          "count" : 6,
          "index_count" : 6,
          "script_count" : 0
        },
        {
          "name" : "text",
          "count" : 30827,
          "index_count" : 2391,
          "script_count" : 0
        },
        {
          "name" : "version",
          "count" : 4,
          "index_count" : 4,
          "script_count" : 0
        }
      ],
      "runtime_field_types" : [ ]
    },
    "analysis" : {
      "char_filter_types" : [ ],
      "tokenizer_types" : [ ],
      "filter_types" : [
        {
          "name" : "edge_ngram",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "analyzer_types" : [
        {
          "name" : "custom",
          "count" : 2,
          "index_count" : 1
        },
        {
          "name" : "standard",
          "count" : 274,
          "index_count" : 274
        }
      ],
      "built_in_char_filters" : [ ],
      "built_in_tokenizers" : [
        {
          "name" : "standard",
          "count" : 2,
          "index_count" : 1
        }
      ],
      "built_in_filters" : [
        {
          "name" : "lowercase",
          "count" : 2,
          "index_count" : 1
        }
      ],
      "built_in_analyzers" : [ ]
    },
    "versions" : [
      {
        "version" : "7.3.2",
        "index_count" : 736,
        "primary_shard_count" : 736,
        "total_primary_bytes" : 17487737592
      },
      {
        "version" : "7.17.3",
        "index_count" : 3564,
        "primary_shard_count" : 3564,
        "total_primary_bytes" : 1665595063854
      }
    ]
  },
  "nodes" : {
    "count" : {
      "total" : 10,
      "coordinating_only" : 0,
      "data" : 10,
      "data_cold" : 5,
      "data_content" : 5,
      "data_frozen" : 5,
      "data_hot" : 5,
      "data_warm" : 5,
      "ingest" : 10,
      "master" : 5,
      "ml" : 5,
      "remote_cluster_client" : 5,
      "transform" : 10,
      "voting_only" : 0
    },
    "versions" : [
      "7.17.3"
    ],
    "os" : {
      "available_processors" : 80,
      "allocated_processors" : 80,
      "names" : [
        {
          "name" : "Linux",
          "count" : 10
        }
      ],
      "pretty_names" : [
        {
          "pretty_name" : "Red Hat Enterprise Linux 8.10 (Ootpa)",
          "count" : 10
        }
      ],
      "architectures" : [
        {
          "arch" : "amd64",
          "count" : 10
        }
      ],
      "mem" : {
        "total_in_bytes" : 1346308325376,
        "free_in_bytes" : 34279321600,
        "used_in_bytes" : 1312029003776,
        "free_percent" : 3,
        "used_percent" : 97
      }
    },
    "process" : {
      "cpu" : {
        "percent" : 29
      },
      "open_file_descriptors" : {
        "min" : 4978,
        "max" : 6619,
        "avg" : 5918
      }
    },
    "jvm" : {
      "max_uptime_in_millis" : 485686534,
      "versions" : [
        {
          "version" : "1.8.0_331",
          "vm_name" : "Java HotSpot(TM) 64-Bit Server VM",
          "vm_version" : "25.331-b09",
          "vm_vendor" : "Oracle Corporation",
          "bundled_jdk" : true,
          "using_bundled_jdk" : false,
          "count" : 4
        },
        {
          "version" : "1.8.0_371",
          "vm_name" : "Java HotSpot(TM) 64-Bit Server VM",
          "vm_version" : "25.371-b11",
          "vm_vendor" : "Oracle Corporation",
          "bundled_jdk" : true,
          "using_bundled_jdk" : false,
          "count" : 6
        }
      ],
      "mem" : {
        "heap_used_in_bytes" : 48444530480,
        "heap_max_in_bytes" : 128151715840
      },
      "threads" : 1848
    },
    "fs" : {
      "total_in_bytes" : 5608349040640,
      "free_in_bytes" : 2033243021312,
      "available_in_bytes" : 1812995170304
    },
    "plugins" : [ ],
    "network_types" : {
      "transport_types" : {
        "security4" : 10
      },
      "http_types" : {
        "security4" : 10
      }
    },
    "discovery_types" : {
      "zen" : 10
    },
    "packaging_types" : [
      {
        "flavor" : "default",
        "type" : "tar",
        "count" : 10
      }
    ],
    "ingest" : {
      "number_of_pipelines" : 4,
      "processor_stats" : {
        "gsub" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 0
        },
        "remove" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 0
        },
        "script" : {
          "count" : 2752,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 317
        },
        "set" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 0
        },
        "set_security_user" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 0
        }
      }
    }
  }
}

Considering the preparation mentioned in my post above, the question is while upgrading one node at a time what should we expect when the new node (Ver 8) comes up, especially in terms of security (certificates etc).

For some reason there is hardly any post/ video I could see of this process on the internet :slight_smile:

Thanks again

Thanks for answers.

Yes, and that helps. But what I meant was more of a throwaway environment you can use, to iterate a few times on just the 7.x to 8.x upgrade process.

You are asking:

And IMO the easiest way to know what to expect is to simply do a 7.x to 8.x upgrade on as-similar-as-possible setup. e.g, on the security/certs/SSL side, what you will see will depend a lot on what have in place right now.

It might be helpful to share the elasticsearch.yml file you are using.

Also, I'd be curious as to output of a GET on

_cat/nodes?v&h=name,role,version,master,u,disk.used,disk.avail,disk.total

Other than that, after a quick look through I didn't see anything significantly concerning in the stats output. All indices are 2 shards, 1 primary and 1 replica. Just over a million docs / index on average. Average shard size seems small, and you have a large shard count per node. Large number of small indices and small shards per node is not usually the optimal pattern, from a performance perspective, but since this wasn’t part of your question, I guess you are OK with it. The mix of index versions looks fine.

Thanks. It is hard to get another environment for trial/ testing purposes. We may be willing to take a risk in the test environment; especially when we are doing one node at a time with the flexibility of rolling back through a snapshot.

Please see elasticsearch.yml below. We have parametrized it. Please note hostnames have been masked and the files are identical in both environments (except for hostnames):


# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
#cluster.name: my-application
cluster.name: ${CLUSTER_NAME}
# disable geoip
ingest.geoip.downloader.enabled: false
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
#node.name: node-1
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
node.name: ${HOSTNAME}
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#path.data: /path/to/data
#
# Path to log files:
#
#path.logs: /path/to/logs
#
path.data: ${ARCHIVES_DIR}/elasticsearch/data
path.logs: ${LOG_DIR}
path.repo: /nas/data/tvmrepo/elasticsearch/backups/${CLUSTER_NAME}
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# By default Elasticsearch is only accessible on localhost. Set a different
# address here to expose this node on the network:
#
network.host: 0.0.0.0
#
# By default Elasticsearch listens for HTTP traffic on the first free port it
# finds starting at 9200. Set a specific HTTP port here:
#
#http.port: 9200
#
# For more information, consult the network module documentation.
#
http.cors.enabled: true
http.cors.allow-origin: /.*/
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.seed_hosts: ["host1", "host2"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
#cluster.initial_master_nodes: ["node-1", "node-2"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
discovery.seed_hosts: ["td-ssp-xxxx","td-sf7-xxxx","td-4xf-xxxx","td-cccb-xxxx","td-ssd-xxxx"]
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true
#
# ---------------------------------- Security ----------------------------------
#
#                                 *** WARNING ***
#
# Elasticsearch security features are not enabled by default.
# These features are free, but require configuration changes to enable them.
# This means that users don’t have to provide credentials and can get full access
# to the cluster. Network connections are also not encrypted.
#
# To protect your data, we strongly encourage you to enable the Elasticsearch security features. 
# Refer to the following documentation for instructions.
#
# https://www.elastic.co/guide/en/elasticsearch/reference/7.16/configuring-stack-security.html
#Adding xpack for authentication
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: elastic-certificates.p12

Output of ``_cat/nodes?v&h=name,role,version,master,u,disk.used,disk.avail,disk.total`` is as follows:

TEST

name                        role        version master    u disk.used disk.avail disk.total
td-xxxx-xxxx cdfhilmrstw 7.17.3  -      5.5d   150.5gb      489gb    639.5gb
td-xxxx-xxxx cdfhilmrstw 7.17.3  *      5.5d   190.4gb    449.1gb    639.5gb
td-xxxx-xxxx dit         7.17.3  -      5.5d   228.3gb    262.7gb      491gb
td-xxxx-xxxx dit         7.17.3  -      5.5d   110.3gb    380.7gb      491gb
td-xxxx-xxxx cdfhilmrstw 7.17.3  -      5.5d   140.4gb    350.6gb      491gb

PROD

name                        role        version master    u disk.used disk.avail disk.total
td-xxxx-xxxx dit         7.17.3  -      5.5d   280.5gb    210.5gb      491gb
td-xxxx-xxxx cdfhilmrstw 7.17.3  -      5.5d   228.7gb    410.8gb    639.5gb
td-xxxx-xxxx cdfhilmrstw 7.17.3  -        5d   571.7gb     67.8gb    639.5gb
td-xxxx-xxxx dit         7.17.3  -      5.7d   427.6gb     63.4gb      491gb
td-xxxx-xxxx dit         7.17.3  -      5.5d   250.5gb    240.5gb      491gb
td-xxxx-xxxx cdfhilmrstw 7.17.3  *      5.5d   286.8gb    211.9gb    498.7gb
td-xxxx-xxxx dit         7.17.3  -      5.7d     437gb       54gb      491gb
td-xxxx-xxxx cdfhilmrstw 7.17.3  -      5.5d   212.4gb    278.6gb      491gb
td-xxxx-xxxx cdfhilmrstw 7.17.3  -        5d   431.2gb     67.4gb    498.7gb
td-xxxx-xxxx dit         7.17.3  -      5.7d   420.1gb     70.9gb      491gb

Nobody in your team has access to a computer? :slight_smile:

[ I setup a test/throwaway environment to test something at least once a week, either on the Mac I'm writing this on, or the raspberry pi underneath it, or on rare occasions on a $2.99 VPS, or a even cheaper AWS EC2 instance. ]

My suggestion, and I'm not at all saying its the only way, just one suggestion, is to download elasticsearch 7.17.3, install it on your laptop/whatever, or use a cloud service, set it up as similarly as you can to the test/prod clusters, and upgrade that to 8.x, and see what happens. Even my 6 year old Mac Mini can run a bunch of VMs or docker containers if I wanted a small cluster!

Again, noting you didn't ask, but your node roles are a bit weird. In PROD, you have 10 nodes, 5x with roles "dit", Data + Ingest + Transform, and 5x with "cdfhilmrstw" aka "do everything". Data is not particularly well distributed, from 212gb on one node to 572gb on another. The config doesn't really make sense to me, but not "broken" either. And note on other threads on here there have been a few comments/observations that versions in the 8.x series try harder, or at least differently, when rebalancing data "evenly" across nodes, though that's a simplification. But that might cost time when rebalancing under 8.x, though the data volume is not huge at all.

Your elasticsearch.yml file looks close. I think you also want HTTP TLS (client <-> elasticsearch) setup/settings, you have transport (elasticsearch<->elasticsearch) TLS settings. Which also leads to the "whats talking to the cluster and how" questions to consider - e.g. is everything connected to ES be compatible with 8.x, or also need changing? Last, I'm no expert, but the cors settings look a bit open to me.