I have a custom native scoring script that is O(n) complexity. It takes a parameter, being a hash. It calculates a score for each document based on the euclidean distance between the hash parameter and the hash in the document.
I have a 2-node es cluster with ~150million documents in an index that has
doc_values turned on.
When I run an
exists query that targets about 1/3rd of these documents with the scoring script being applied, the results take about 5-7 minutes to come back. Whilst it's running, I can see the disk utilisation going nuts on netdata (https://github.com/firehol/netdata)
When I run it a second time with a different hash parameter on the scoring script, it takes approximately 1/3rd the time and the disk utilisation only goes nuts for the duration of the query.
When I run it a third time, again with a different hash parameter on the scoring script, it takes a few seconds and the disk utilisation spike is minimal.
I know from previous tests that after a while, the query will go back to taking a long time again.
I'm really curious to know what is going on here:
- Is it taking less time because the fielddata is being loaded into memory?
- Is it shard caching kicking in?
- If that's true, why doesn't the second query take a few seconds?
- What's causing the cool-down?
- At what point will the cool-down process kick in?
I'd love to know what process the internal parts are using to make this all work like it does. If anyone could provide insight, it'd not only be very useful, but also fascinating.
Many thanks for your time.