Why are indices slower to create the more indices you have?

ubershmekel · September 13, 2015, 8:27pm

I was indexing a list of a few hundred product builds that have about a gig of data each. I'm using one machine for this. I don't mind searches being a bit slow (e.g up to 30 seconds) and I don't mind indexing to take about 10 minutes for each build. But I had a few builds that took around 3 hours near the end of the indexing process. So I googled and found this github issue

And it's stated there that indexing does in fact become slower the more indices you have which I found peculiar. Do you guys have more information on why this is?

warkolm · September 13, 2015, 9:37pm

As was mentioned in the GH issue, can you provide more information about your cluster infrastructure?
How many indices have you created, how large are they?

ubershmekel · September 13, 2015, 11:55pm

It's one machine, 627 indices and each one is 2,000-10,000 docs and the size of an index folder on the disk is 200-400MB.

For more stats see the top of my _stats results:
http://pastebin.com/3v6jzpqg

Most of the documents are very similar. If I knew this amount of indices would be a challenge I could have made one index with all the documents. I have a sliding window of builds I'm tracking so I wanted to make it easy to garbage collect a build by just removing its index.

I'm also worried about how slow pagination is. Sometimes I don't want to "scan" or "scroll" but I just want result numbers 15-30. A search that takes 150ms for results 0-15 can take me 30 seconds for results 15-30 (out of 46 total results). But perhaps that's a separate topic.

Let me know if more information is needed. Thank you for your help!

dadoonet · September 14, 2015, 5:03am

More than 3000 shards on a single machine?
That's really too much.
You have a small number of docs, so 1st set 1 shard per index.

You did not tell how much memory you have.

bleskes · September 14, 2015, 8:21am

It would be also great if you can get the output of GET /_nodes/hot_threads?threads=1000 on the master while you see slow index creation. It will help see what it waits on. It will also be good if you can set your logs on DEBUG level and share them.

ubershmekel · September 14, 2015, 4:23pm

hot_threads.txt at http://pastebin.com/KVF150Kk

The machine has 8GB of ram and I set the "maximum memory pool" to 2048MB.

By setting my logs to debug do you mean "index.indexing.slowlog.threshold.index.debug"?

bleskes · September 15, 2015, 9:11am

Thanks. I can't see anything out of order (except for the node being busy doing shard stats, which is explainable by the number of shards).

I meant change all the logs to DEBUG level:

curl -XPUT localhost:9200/_cluster/settings -d '{
   "persistent" : {
       "logger._root" : "DEBUG"
   }
}'

Also can you describe in more detail what is slow exactly? Is it the time for an index creation API call to come back? Is it the time for the index to become yellow? Is it indexing slowness?

Topic		Replies	Views
Slow Index create Elasticsearch	8	2391	November 21, 2019
Index creation performance Elasticsearch	13	587	July 6, 2017
Why is first index creation each day slower? Elasticsearch	7	828	August 23, 2021
Elasticsearch index creation / deletion incredibly slow Elasticsearch	3	2634	July 6, 2017
Elastic is very slow on bulk insert Elasticsearch	9	1883	July 5, 2017

Why are indices slower to create the more indices you have?

Related topics