Phew, I think I finally found the root cause of the indexing speed slowing down. It's ICU4J transliteration, and the fact that after about 13.2M documents we start to have a lot of Chinese data.
I asked in a new post how to avoid duplicate transliterations (I am assuming the use of icu_transform
in one property but with multiple fields results in running transliteration multiple times for the same identical text): ICU transform filters slowing down indexing: how avoid duplicate transliterations?
Thanks!
PS. @Christian_Dahlqvist feel free to continue with the good insights in the new post!