We have a use case where we need to create around 4000 different indices in a cluster. Every index has a different mapping. We were benchmarking elasticsearch 6.6.1 and observed the following.
When first created a single index and indexed around 32k documents and ran getByID for stored documents. The 99.9% latency was under 100ms.
We then went ahead to just create another 1500 indices (with one replica and without any data) and ran the benchmark again for the earlier index. The 99.9% latency shot upto 700-800ms.
When we go ahead and delete these empty 1500 indices, the latency goes back to under 100ms.
Note: Our cluster contains 12 data nodes having 1TB disk and 16gb memory allocated to ES.
Does number of indices in a cluster affect the getByID call in anyway?
Sure I'll fetch the same document and get back on the results.
I have just written a dropwizard service and am using transportClient to fetch the document by id. I am using our own load generation tool to hit this getByID api with a QPS of 5000 and then plotting the latency.
Our use-case is as follows:
We need to store information about products. There are around 4000 different categories of products, so we intended to create 4000 indices, where we define the required mappings for each category of products.
In the current scenario, I first created a single index and indexed around 32,000 documents and ran getByID benchmarks on it. The 99.9% latency for the same was around 100ms. Then I created another 1500 indices and put the mappings for the indices as well. I didn't index any other document in the 1500 indices and re-ran the benchmark for the earlier index. Now the latency for the same getByID increased to around 600-700ms.
I then deleted the 1500 indices and latency came back to around 100ms. I am now adding 100 indices every half an hour and checking what is the point after which the latency increases. Have attached the screenshot for the latency graph.
We deleted and re-created the 1500 indices. However this time we didn't put the mapping in these 1500 indices. We then put the mapping for one of the test index and indexed around 30k documents again and ran benchmark on the test index. The results were perfect 99.9% was in the range of 100ms only.
We then put the mapping for remaining 1500 indices (without indexing any documents in those indices) and the latency shot up for the test index to 600-700ms again.
Sharing the size of cluster state
With one index and it's mapping set => 248KB
With 1500 index (without mapping) => 3.7MB
With 1500 index with mapping set =>114MB
Does the size of cluster_state has a role to play in the normal getByID and index(put) latencies? Please note that the data nodes have total ram of 39GB, out of which 16GB heap space has been assigned to elasticsearch
@dadoonet: We ran the benchmark with just one document as well. Results are the same. As you increase the number of indices along with their mappings, the latencies start to take a hit.
It seems like it is because of cluster state only. Any further input here?
The title says that you are running getbyId and put, while your description states that you are only fetching data and not performing and updates. Which one is correct? Can you share the exact queries you are running? Are you specifying a single index name in your requests or an index pattern?
When we ran benchmarks for put, they were going latent too (hence included that in the title). In the current runs we are just running getById requests and nothing else.
I am specifying the exact indexName in the request.
Like I mentioned previously. I am running a dropwizard service and am using transport client. Pasting the code below
@Timed
@ExceptionMetered
public byte[] getById(String indexName, String id){
GetRequest getRequest = new GetRequest(indexName, indexName, id);
return client.get(getRequest).actionGet().getSourceAsBytes();
}
I am hitting the above API with the testIndexName and choose an ID randomly from the list of 32k documents which we have indexed.
Switching to rest client also didn't make any difference.
We slowly ramped up the concurrency and the results are the same. As soon as we delete the other indices or not put their mapping the latencies come down.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.