Elasticsearch 8.9.1 indexing bottleneck on i3.2xlarge and d3.2xlarge nodes in EKS using ECK

A few questions that don't appear to have been covered in the last thread:

  1. How many shards do each of the content/hot indices have?
    • While I've seen the 8.x balance improvements help out in places, there are some "limitations" on what it can do if it is just one index with one shard consuming a majority of resources.
    • For reference, when I setup a high-throughput index, I'll generally have the content/hot state have a number of shards (generally some nice multiple of the number of hot/content nodes). Then when the index moves to warm, I'll shrink the shard count to a more meaningful size.
      • This generally from my experience helps ensure that write loads get more evenly distributed across nodes.
  2. Your hot node memory is a bit weird, these nodes have the least amount of memory then your other data nodes. I believe (and this is somewhat of an educated assumption), hot nodes generally benefit from both HEAP memory, as well as file system memory.
    • The i3 series I know has these weird memory sets, AWS now has the i4i instance type which now has much more standard memory sizes.
    • If you can't go with the i4i, maybe look at the i4g instance type. If you can support ARM instances, I've found that Elasticsearch performs perfectly fine using ARM.
  3. I personally think the memory/heap on your master nodes is overkill, but I don't think this should have a negative impact on performance (though I don't really know this)
  4. How are you routing requests to your cluster?
    • Might be bit of a controversial take, but at the scale, I'd might consider adding some coordinating only nodes in front of all these nodes.
  5. In your previous post you mentioned this cluster is managed by ECK. Could you provide the CR that defines this cluster?
    • I've written about some general improvements that can be made when running Elasticsearch via ECK, here, that I found while doing load testing for a relatively medium-sized project (3 nodes, sustaining at ~30k e/s, 1 index, 3 shards + 1 (3) replicas)
  6. I personally avoid touching most of the Elasticsearch cluster settings, I think the majority of these are tuned to work in the majority of scenarios, and it can quickly become a footgun to change these as they can have some weird side-effects.
  7. Do you do any sort of document pre-processing either via Logstash or Ingest pipelines?
    • In you last post you mention:

    No ingest pipeline within elasticsearch - I have a separate pipeline that predates our use of elasticsearch.

    • Did you ever confirm that your "separate pipeline" was not causing these limitations?
  8. In the last post, I didn't really see any mention of what your index mappings really look like, are we talking just keywords and numbers or are we also dealing with things like text and analyzers.
    • Could you provide a copy of your indexs' mappings?
  9. Entering the world of "really new", and probably not the first thing you should be looking at, but have you looked at _synthetic source or TSDS?
    • These both offer some advantages to performance, but do come with trade-offs/limitations.

Note: I'm probably missing some other questions, but wanted to somewhat avoid rehashing questions that were answered in the previous post, as (I'm assuming) these were already re-evaluated after the upgrade.