Hi, could please anybody confirm my hypothesis regarding sorted scroll?
- I assume that normal sorted search :
- loads all fields we sort on into memory in case of fieldData, in case of
doc_valueswe are just not limited by heap (but hard to imagine sorting without using memory
- returns all hits sorted
- I can't really imagine how sorted scrolling is implemented though because you kinda have to sort say billion documents over a
doc_valuesfield, at least partially. Until then scrolling cannot be started, because you wouldn't know what really is the first document to fetch, right? BUT scrolling starts returning sorted results immediately although very slowly (800 docs/s) which is 10 times slower than scanning the same thing.
- what is the bottleneck here? I have like 7% IO wait, which eliminates issue with
doc_values, low network traffic but CPU is at 98% which seems like there was a sorting thread at the background providing scrolling/searching thread with documents to retrieve. If it is like this, is this sorting thread a bottleneck? Meaning that reading 300M doc_values fields and sorting them is IO & computational heavy operation that makes it 10 times slower ?
- Is there any way to improve it? Since all fields in my documents are
doc_valuesand it is search/scroll only node. I assume that I should split memory available to something like 30%-heap/70%-off-heap, right? Providing sufficient amount of OS ram for FS cache. Is there anything else I'm not aware of?