Performance impact of increasing position_increment_gap

Hi everyone,

for my current task it would help me to have a large position_increment_gap on some fields. These fields may each contain 100 values or more.

My question is:
How (severely) does increasing position_increment_gap impact the performance (i.e. search speed and memory usage)? For example when increasing position_increment_gap to 500, 1000 or higher.

I observed that it increases the index size, but I couldn't find any other information on the performance impact.

I'm fairly new to Elasticsearch and this forum, so if this is very obvious or the information is easy to find, please forgive me. If this is the case I would be very thankful if you could tell me where I can find more information on this topic.

Thanks in advance for your help!

Hi Phillip.
I don't think it should have much of an impact. As you've already discovered bigger numbers might require more bytes to be stored but the memory cost or speed of matching shouldn't be adversely affected. When doing proximity searches for terms A and B Lucene reads the sequence of these words' positions in each doc and checks if they appear in the required slop query distance. If these position integers are bigger I don't expect that substantially changes the calculations being performed.
Integer sequences are encoded on disk using "variable int gap-encoding" - we only record the difference between one number and the next and minimise the number of bytes required to store that diff. Bigger position-increment-gaps will mean bigger diffs = more bytes to write/read.

Benchmarking will reveal the true impact but I would have thought it would be negligible.

1 Like

Thank you very much, Mark!
Once my implementation is more mature I might perform some performance tests on this topic.
If I do I'll post the results here.

1 Like