Search API min_score ==> what is the definition of score?


(mp2893) #1

Hi, I am a beginner of ES.

I plan to store serveral millions of news documents using ES.
Now if a user queries something like "obama", hundreds of thousand
documents will be returned to the user. This will definitely take a
lot of time. (If I am wrong about this, plz correct me)

So I wanted to limit the size of documents to be searched and
returned.

Obviously I want the most relevant documents to be returned. And I
came across "min_score" function. (http://www.elasticsearch.org/guide/
reference/api/search/min-score.html)

According to the example, looks like I get to choose the score. But I
am wondering what the definition of this "score" is.
Is it the degree of relevance of the document to the query? (like,
cosine similarity?)
Or is it a some kind of percentage of the whole documents that are
hit?

Plz help a newbie out.
Thanks.


(David Pilato) #2

I plan to store serveral millions of news documents using ES.

Now if a user queries something like "obama", hundreds of thousand
documents will be returned to the user. This will definitely take a
lot of time. (If I am wrong about this, plz correct me)

No. By default only the 10 first relevant will be returned.
It will be very fast.

Just try to play with it and you will see how easy ES is.

HTH

--
David Pilato
http://dev.david.pilato.fr/
Twitter : @dadoonet


(Clinton Gormley) #3

Hiya

I plan to store serveral millions of news documents using ES.
Now if a user queries something like "obama", hundreds of thousand
documents will be returned to the user. This will definitely take a
lot of time. (If I am wrong about this, plz correct me)

By default, ElasticSearch will return you the 10 most relevant docs,
which sounds like exactly what you want :wink:

clint


(mp2893) #4

thanks for the quick reply.
If the "size" option does what I want, then what is "min_score" option for?

Ed

2012/2/11 Clinton Gormley clint@traveljury.com

Hiya

I plan to store serveral millions of news documents using ES.
Now if a user queries something like "obama", hundreds of thousand
documents will be returned to the user. This will definitely take a
lot of time. (If I am wrong about this, plz correct me)

By default, ElasticSearch will return you the 10 most relevant docs,
which sounds like exactly what you want :wink:

clint


(system) #5