We are using couchbase transport plugin, it takes docs from couchbase and indexes them into Elasticsearch.
I am guessing the plugin uses bulk indexing, as things get slightly better when increasing the threadppol.bulk.queue_size (still slow that it is not usable).
The CB plugin does all the indexing, I never index directly in ES, we create/update in CB then the plugin uses XDCR replication to replicate into ES.
Our shapes are all circles, nothing complex, no specific index settings apart from the mapping, difference is startk as I have the same mapping running in old and new version (note that there is a different CB transport plugin version for each ES version).
The easiest way I can show you this is to have an online meeting or something to show the real deal in our test machines.