A few questions that don't appear to have been covered in the last thread:
- How many shards do each of the content/hot indices have?
- While I've seen the 8.x balance improvements help out in places, there are some "limitations" on what it can do if it is just one index with one shard consuming a majority of resources.
- For reference, when I setup a high-throughput index, I'll generally have the content/hot state have a number of shards (generally some nice multiple of the number of hot/content nodes). Then when the index moves to warm, I'll shrink the shard count to a more meaningful size.
- This generally from my experience helps ensure that write loads get more evenly distributed across nodes.
- Your hot node memory is a bit weird, these nodes have the least amount of memory then your other data nodes. I believe (and this is somewhat of an educated assumption), hot nodes generally benefit from both HEAP memory, as well as file system memory.
- The i3 series I know has these weird memory sets, AWS now has the i4i instance type which now has much more standard memory sizes.
- If you can't go with the i4i, maybe look at the i4g instance type. If you can support ARM instances, I've found that Elasticsearch performs perfectly fine using ARM.
- I personally think the memory/heap on your master nodes is overkill, but I don't think this should have a negative impact on performance (though I don't really know this)
- How are you routing requests to your cluster?
- Might be bit of a controversial take, but at the scale, I'd might consider adding some coordinating only nodes in front of all these nodes.
- In your previous post you mentioned this cluster is managed by ECK. Could you provide the CR that defines this cluster?
- I've written about some general improvements that can be made when running Elasticsearch via ECK, here, that I found while doing load testing for a relatively medium-sized project (3 nodes, sustaining at ~30k e/s, 1 index, 3 shards + 1 (3) replicas)
- I personally avoid touching most of the Elasticsearch cluster settings, I think the majority of these are tuned to work in the majority of scenarios, and it can quickly become a footgun to change these as they can have some weird side-effects.
- Do you do any sort of document pre-processing either via Logstash or Ingest pipelines?
- In you last post you mention:
No ingest pipeline within elasticsearch - I have a separate pipeline that predates our use of elasticsearch.
- Did you ever confirm that your "separate pipeline" was not causing these limitations?
- In the last post, I didn't really see any mention of what your index mappings really look like, are we talking just
keywords
andnumbers
or are we also dealing with things liketext
andanalyzers
.- Could you provide a copy of your indexs' mappings?
- Entering the world of "really new", and probably not the first thing you should be looking at, but have you looked at
_synthetic
source or TSDS?- These both offer some advantages to performance, but do come with trade-offs/limitations.
Note: I'm probably missing some other questions, but wanted to somewhat avoid rehashing questions that were answered in the previous post, as (I'm assuming) these were already re-evaluated after the upgrade.