Optimizing ElasticSearch for a lot of small bulk requests

Hey there,

I'm basing my elastic search setup on this article https://www.elastic.co/blog/elasticsearch-as-a-time-series-data-store.
The goal of it is to be able to store telemetry and logs from our microservices, and telemetry and logs that we collect from client apps. I am currently in the phase of collecting telemetry from microservices so I have about 80 instances that each publish 250 datapoints in a single bulk request every 10 seconds. So I don't think this is anything to worry anyone. The size of the bulk request is 75KiB.
I am using the AWS ES service so it is Elasticsearch 1.5 under the covers. The cluster is made up of 3 instances with 9 shards total and one replica. I have tried to turn off replication and I get the same result.
I am using daily indexes with refresh set to 10s.
The services open a fresh connection every time and publishing is single threaded so there is no way to have more than 80 active requests worst case.

Every once in a while, I am getting rejected executions due to queue size (it is set to 50).
I am using large instances with plenty of resources. The reason I don't go into detail about this is because the reported CPU, memory and IO is extremely low so it doesn't seem to be a factor. The cluster isn't overloaded however the queues are filling up.

Horizontally scaling doesn't help, I get the same errors. I am trying to build a cluster that can handle tens of thousands of these small bulk requests and I can't get it to properly handle 80.

I can understand how increasing the number of nodes and shards will improve searches but I'm not clear what changes need to be made in order to achieve a horizontally scalable cluster with respect to writes in my case.
I could see it helping when the I/O is pegged for example, but in my scenario I'm hitting some other bottleneck and I don't understand what it is. To me the load I'm placing shouldn't be an issue.

Thank you for the help