Do you have a single index with 3 primary shards and 1 replica in the cluster? If so, what is the size of this index in trem of shard size in GB as well as document count? What is the average size of the data in this binary field?
What type of storage are you using? Local SSD?
What load is the cluster under (read and write) when you see these long latencies?
This usage pattern is quite unusual. It looks to me like you are trying to use Elasticsearch as a key-value store for some binary data. Why are you using Elasticsearch this way?
Average size of binary field data: 8kb. The followings are part of the indices.
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open messages-000370 DneyyWoHQp-T17_kjn9kCg 3 1 1471584 0 12gb 6gb
green open messages-000371 -h50Q3JiSrmsu0mmt7ZCMw 3 1 1394995 0 10.9gb 5.4gb
green open messages-000372 XQFIvqAvSeCRMe_GCNYZ5A 3 1 1380346 0 11gb 5.5gb
green open messages-000373 wTQC9L3dS_CiRMbC8rSJpQ 3 1 1655118 0 10.3gb 5.1gb
green open messages-000374 Wuft_rovRDqs1i8mX_Ygyw 3 1 1182124 0 10gb 5gb
When doing indexing and search - long latencies, CPU usage is less than 20%, OS memory is roughly 18G. The hardware resource is reasonable but the performance is not good. The worst case is 30s+, normal situation is seconds (2-10s).
How many documents are indexed per second? What bulk size are you using? How many concurrent queries are you running when you see the large latencies?
Can you please describe the use case and why you are using Elasticsearch this way?
Why have you limited your shard size to 5GB? That seems quite small and will result in a large number of shards that need to be searched in every query.
How many documents are indexed per second? What bulk size are you using? How many concurrent queries are you running when you see the large latencies?
At least 40 message/s at the entry points. The total must be more than that. Bulk action: 5000, flushInterval: 5s; Only one query when you see the large latencies.
Can you please describe the use case and why you are using Elasticsearch this way?
Sure. The system is kind of trace e.g. a message come to the system and the system processes the message via A->B->C->D->E. And at each point, we trace the event and keep the message content in order to reprocess the message. It may not be the perfect design but it is our current design. We may improve it to extract message out to other DB but not at this stage.
Given the number of indices in the cluster I will assume your retention period is reasonably long, potentially up to 6 months.
Although I am not sure Elasticsearch is the ideal choice for this use case, I think there are a few things you could try in order to improve performance.
1. Limit the number of shards queried
When Elasticsearch indexes a document it routs the document to one of the shards based on the document ID by default. Your query searches for one spcific document ID but queries all shards. You can try adding the document ID as a routing value with each search request. This should allow Elasticsearch to only search one shard (the one that could hold the document based on the ID) for each index.
This should work even if you change the number of primary shards.
2. Increase shard size
You currently have a lot of very small shards, which can be slow to query. I would recommend increasing the shard size so you get fewer shards to query. At the same time I would also recommend you increase the number of primary shards as that will make the routing change described earlier more efficient. If you go to e.g. 6 primary shards per index only 1/6 of all shards would need to be searched instead of 1/3. This does not mean that you need to reindex - you can simply let the smaller indices with 3 primary shards age out over time.
This likely mean that you will need to make each index cover a larger time period, but that should not be a problem if your retention period is quite long. Maybe change the rollover criteria to something like this:
3. Forcemerge indices no longer written to
It may also be beneficial to add a step to ILM that forcemerges indices that have rolled over down to a single segment if you do not already have this in place.
About the shard size and shard count, is there any consideration? Like nodes count, hard drive or memory? Currently most of the case, we keep data for 14 days, maybe shorter.
We do a little bit upsert to data. From hot to warm stage, I can add merge. Will this affect upsert?
The performance issues only happens when messages come through i.e. indexing. I am thinking whether it helps if increasing refresh interval and increasing query cache size or not. Any suggestions?
Thanks.
5GB is a quite small shard size. Given that you search by ID I would expect you to benefit from larger shards. 50GB shard size is not uncommon and something I would try to achieve.
A larger number of primary shards would make your routing more efficient, so would probably be a worthwhile change. I would look to increase this gradually, e.g. initially go from 3 to 6. If you with this achieve the larger shard size you may later increase this to 9.
If you have a short retention period, e.g. 14 days, you need to make sure that you get a suitable number of indices to cover this period. In this case I would probably keep the rollover period at 1 day so you get at least 14 indices covering the retention period.
If you are updating existing data it does not make any sense to forcemerge down to a single segment. Forcemerging down to a single segment is I/O intensive and completely undone if you update or write to the index.
If that is the case I would look at storage I/O performance, especially at times you are performing indexing and see slow queries. What does iostat -x show?
That is very high (bad) values for r_await and w_await, which indicates that the performance of the storage you are using likely is the bottleneck.
I would looking into concentrating on improving the storage you are using. If the storage is this slow I do not think any of the other optimisations will make much difference.
Hi Christian,
Thanks for your help. We setup an environment with 30G + 6 shards with SSD had drive. The performance of query while writing is not ideal still (20s +). I am thinking that might help if we can separate the write and read. Does Es support write and read separately? like one node is setup for writing, another node is setup for reading whose data is syncing from writing node.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.