I was wondering regarding the impact eager_global_ordinals: true would have as soon as I enable it on prod. As in, will it impact the ongoing production traffic?
If yes, any idea how can I gauge this before enabling the mapping on my fields?
Secondly, the field onto which I want to enable global ordinals is a multi-field. Will I be able to enable just on the keyword type of the framework field?:
Two separate indices. One holds the data for today. The other holds the data for past 59 days. We merge the daily created index to the larger index everyday and start with a new one the next day.
Total daily ingestion of records: 1.3 million
Search goes to both the indices (since we need the aggregations over the past 2 months)
We get roughly 1 req every 2 seconds.
Size of the larger index: 79GB, smaller index (daily created): doesn't go beyond a GB.
What exactly do you mean by "merge"? Merging in Elasticsearch is something different, and there's no feature that just combines two indices together like that. Perhaps you mean you use reindex to copy the records over?
Also, why do you do this? Why not just write everything into the large index to begin with? Indexing 15 records per second into an 80GB index should be no big deal.
What evidence do you have that eagerly-loading global ordinals will help? I wouldn't expect the default behaviour that computes global ordinals at search time to be too onerous in a small dataset. Do you have tight requirements on search latency that you sometimes exceed when a refresh occurs?
What evidence do you have that eagerly-loading global ordinals will help? I wouldn't expect the default behaviour that computes global ordinals at search time to be too onerous in a small dataset. Do you have tight requirements on search latency that you sometimes exceed when a refresh occurs?
So we have roughly 8 aggregations being fetched at once. Out of these 8, 2 of them don't have eager_global_ordinals: true set. Removing these aggregations from my query reduces the response time by 1.5x - 2x (which is huge when the response times touch 3 seconds).
Whereas if I remove any other 2 aggregations (which have the eager_global_ordinals set), it doesn't change the response time much.
Plus, the two have low cardinality. So I was wondering that enabling the global ordinals to generate eagerly might help in my case.
Thus, based on the above findings, I wanted to enable them on production on the remaining 2 fields. But I am trying to gauge the impact on the ongoing traffic it might have the moment I enable it.
I'd be a little surprised if building global ordinals takes a meaningful fraction of the 1-1.5s that these aggregations take - neither index is very large, and the larger index doesn't change much so should only be re-computing them once a day anyway.
I would also not expect it to make too much difference on your indexing, but it's hard to accurately predict a performance impact since there are so many variables. The simplest and most reliable way to get an answer is to run an experiment, ideally on your test cluster. It's easy enough to switch it off if you don't like what you see.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.