About the retrieval depth & ranking

Hi all,
I have 2 questions about retrieval and scoring,

  1. How deep when ES retrieving documents, even without scoring? By now the information I got was all. please help me to ensure this mechanism. Actually when I was developing a web search engine, normally the retriever would interrupt when it thinks there are "enough good" candidate documents for this search, likely, just top10 in 10,000 docs.

  2. The search result ranking for one same query on one static index, stable or not? For most times it is stable, but sometimes ES returns different results, I was guessing this is caused by some bad shards, but not sure. Any help?

Thanks all ^.^

https://www.elastic.co/guide/en/elasticsearch/guide/current/distributed-search.html may help clarify this.

Thanks mark:) ,
The chapter explains how "dispatcher & gatherer" works, but I want to know when the gatherer knows its own private priority queue is fulfilled. (I guess, It can't just retrieve exactly from+size docs, the pagination can't be stable if so, and also it seems doesn't retrieve all docs since the book says "deep paging is a problem" - sorting won't be a problem when you retrieved all docs, I think) https://www.elastic.co/guide/en/elasticsearch/guide/current/pagination.html

It just grabs the number of docs (default 10) from each shard.

So if you have 5 shards, each provides the top 10, then the reduce phase takes that total of 50 and provides the top 10 from that.

Yes I understand that part, I'd like to know how a shard chooses its top10 result, I mean, the progress of building this priority queue(from + size, 10 by default)

Ah right, sorry! So that's this part - https://www.elastic.co/guide/en/elasticsearch/guide/current/sorting.html

Basically it scores anything that matches the query/filter.

1 Like

Thanks! this helps me a lot :slight_smile: