Why are indices slower to create the more indices you have?

I was indexing a list of a few hundred product builds that have about a gig of data each. I'm using one machine for this. I don't mind searches being a bit slow (e.g up to 30 seconds) and I don't mind indexing to take about 10 minutes for each build. But I had a few builds that took around 3 hours near the end of the indexing process. So I googled and found this github issue

And it's stated there that indexing does in fact become slower the more indices you have which I found peculiar. Do you guys have more information on why this is?

As was mentioned in the GH issue, can you provide more information about your cluster infrastructure?
How many indices have you created, how large are they?

It's one machine, 627 indices and each one is 2,000-10,000 docs and the size of an index folder on the disk is 200-400MB.

For more stats see the top of my _stats results:
http://pastebin.com/3v6jzpqg

Most of the documents are very similar. If I knew this amount of indices would be a challenge I could have made one index with all the documents. I have a sliding window of builds I'm tracking so I wanted to make it easy to garbage collect a build by just removing its index.

I'm also worried about how slow pagination is. Sometimes I don't want to "scan" or "scroll" but I just want result numbers 15-30. A search that takes 150ms for results 0-15 can take me 30 seconds for results 15-30 (out of 46 total results). But perhaps that's a separate topic.

Let me know if more information is needed. Thank you for your help!

More than 3000 shards on a single machine?
That's really too much.
You have a small number of docs, so 1st set 1 shard per index.

You did not tell how much memory you have.

It would be also great if you can get the output of GET /_nodes/hot_threads?threads=1000 on the master while you see slow index creation. It will help see what it waits on. It will also be good if you can set your logs on DEBUG level and share them.

hot_threads.txt at http://pastebin.com/KVF150Kk

The machine has 8GB of ram and I set the "maximum memory pool" to 2048MB.

By setting my logs to debug do you mean "index.indexing.slowlog.threshold.index.debug"?

Thanks. I can't see anything out of order (except for the node being busy doing shard stats, which is explainable by the number of shards).

I meant change all the logs to DEBUG level:

curl -XPUT localhost:9200/_cluster/settings -d '{
   "persistent" : {
       "logger._root" : "DEBUG"
   }
}'

Also can you describe in more detail what is slow exactly? Is it the time for an index creation API call to come back? Is it the time for the index to become yellow? Is it indexing slowness?