Interesting... Your diskio document counts are very high there must be many many disks on each host they are about 3-5X what I would expect.
But other numbers see ok I guess it still seems high for 1000 metricbeat hosts.
So here are some of the things I would do, others may have other opinions.
You can try this is on your 1 node
You already have a lot of segments (underlying filesystem data structures) You are not force merging when you roll over so you are generating segment that are beginning to add up. Los of segments = slow queries, you already have 322 segments when you only had 22 shards.
In the ILM Policy On Rollover set force merge to 1 segment ,
You can see your segments with
GET _cat/segments/metricbeat-*/?v
You can clean this up by running the following command this may take 1 or more hours to run, as there is only 1 merge thread per node.
POST metricbeat-*/_forcemerge/?max_num_segments=1
it is a synchronous command but you can just run another command and check the results.
GET _cat/segments/metricbeat-*/?v
Once the segments are merged there will be only 1 per shard.
But over all... if you are really going to ingest and query 350GB / day or more, I would probably run more than a single node. Here are some suggestions, others may have other suggestions.
350GB / Day is non-trivial but we certainly have many use cases with Multiple TBs per day, its about proper scaling.
I would run perhaps 3 nodes, Each with 28GB Heap 1-2 TB SSD
Index Template : 3 Primary Shards, 1 replica (technically this would be better with 6 node so each shards can be completely parallel, there is some math) (If you do not want replicas you can do that , but if you lose a node you will corrupt your index)
ILM Rollover 150GB or 1 day : This will make 3 x 50GB Shards, the shards should balance out and you will get some parallelism.
Force Merge on Hot Rollover to 1 segment.
Your indexing seems OKish there are some settings that could make that better like
"index": {
"refresh_interval": "30s",
"translog": {
"flush_threshold_size": "2gb"
}
Other considerations would be how long the retention which you have not mentioned os say you wanted to keep this for 7 Days = 350GB Day + 1 replica = 700 GB / Day * 7 Days = ~5TB Data.
Other consideration is the you have some bottleneck with the IOPs, but it that is direct attached SSD but I am not really familiar with AHCI