Need help In Function Scoring

POST: http://localhost:9200/myindex/type/_search
{
"query": {
"function_score": {
"functions": [
{
"gauss": {
"b": {
"origin": "0",
"scale": "1000"
}
}, "weight":"2"
}
],
"query": {
"match": {"b":656}
},
"score_mode": "multiply"
}
}
}

I have 2 fields in my doc a:string and b:number
So I was trying various ways to change scoring pattern.

Requirement: I need to have the score based on only coord and not on tf-idf.
Example: If my query is "location":"bangalore" OR "location":"chennai" OR "location":"mumbai"
So If a record contains all the matches, it should come on top, then a document with less number of matches like that.

So the scoring should respect coord only and not tf-idf. How can i achieve this.
From lucene documentation this is the scoring logic:
score(q,d) = coord(q,d) · queryNorm(q) · ∑( tf(t in d)· idf(t)2 · t.getBoost()· norm(t,d))

Hi,
If your list of location is static you can go with a hand made field location_score and you can set points if the location match.
In your case if your document have the location "bangalore" set the location_score to 2.
i.e set the scoring logic before saving your document, so on query you only need to sort.

Thanks for your help.
In my scenario its the number of matches which i am taking into consideration

If my document has location bangalore,chennai,mumbai the there are 3 matches..It should come on the top.

If my document has location bangalore,chennai the there are 2 matches..It should come below the one with 3 matches..
Its like this
Is there a way to do this.

Hi,

I don't know enough about your data, but I can guess if you have documents like this ones:
{"id": 1, "location": ["bangalore", "chennai"], ...}
{"id": 2, "location": ["bangalore", "bollywood"], ...}
{"id": 3, "location": ["bollywood"], ...}

before saving your document you need to check your location and add points so it will give:

{"id": 1, "location": ["bangalore", "chennai"], "location_score": 4} <--- 4 because 2 points by correct location
{"id": 2, "location": ["bangalore", "bollywood"], "location_score": 2} <--- 2 because there's only one good location
{"id": 3, "location": ["bollywood"], "location_score": 0} <--- 0 because bollywood is not in the list of good location

So if you sort by "location_score" you'll get your data in the correct order.

my query is "location":"bangalore" OR "location":"chennai" OR "location":"mumbai"

I can't put a new field in my document (location_score) because the number of match i am saing is based on the query.
In this case how many conditions of the query is matched with the document.

If my document is like:
Doc21: { location :["bangalore","chennai","mumbai","london"] } // for the given query there are 3 matches

for document Doc2:{ location :["bangalore","chennai","berlin","paris"] }// for the given query there are 2 matches

so here the scoring should be based on the number of matches. So Doc1 will come above Doc2.

If my query was "location":"bangalore" OR "location":"chennai" OR "location":"berlin"
here Doc2 will come on top.

This is the requirement.

Hi,

Does this match your requirements:

create the docuemnts

curl -XPOST 127.0.0.1:9200/test_city/city_score/1 -d '{"id":1, "location":["bangalore", "paris", "mumbai"]}'
curl -XPOST 127.0.0.1:9200/test_city/city_score/2 -d '{"id":2, "location":["bangalore", "paris"]}'

terms query with sort on _score

curl -XPOST 127.0.0.1:9200/test_city/city_score/_search -d '{"query":{"terms":{"location":["bangalore":mumbai]}}, "sort":"_score"}'

it return:
{"took":10,"timed_out":false,"_shards":{"total":1,"successful":1,"failed":0},"hits":{"total":2,"max_score":0.581694,"hits":[{"_index":"test_city","_type":"city_score","_id":"1","_score":0.581694, "_source" : {"id":1, "location":["bangalore", "paris", "mumbai"]}},{"_index":"test_city","_type":"city_score","_id":"2","_score":0.09494676, "_source" : {"id":2, "location":["bangalore", "paris"]}}]}}

Doc 1 have a better score: 0.581694
than doc2 0.09494676

This will not work every time.

Doc1 : "location": [ "bangalore", "paris", "mumbai", "kolkata"]
Doc2 : "location": ["bangalore", "paris","mumbai" ]
Doc3 : "location": ["bangalore", "paris", "mumbai","calicut" ]
Doc4 : "location": [ "bangalore","paris"]
Doc5 : "location": [ "bangalore","paris","bangalore","bangalore"]
Doc6 : location": [ "bangalore", "mumbai", "kolkata"]

I have inserted these data.

Queried for this : {"query":{"terms":{"location":["bangalore","mumbai","calicut","kolkata","paris"] } }, "sort":"_score"}

here the doc with 3 matches came above one with 4 matches.
Here Doc2 came above Doc3.

Result I got:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 5,
"max_score": 1,
"hits": [
{
"_index": "test_city",
"_type": "city_score",
"_id": "5",
"_score": 1,
"_source": {
"location": [
"bangalore",
"paris",
"mumbai",
"kolkata"
]
}
},
{
"_index": "test_city",
"_type": "city_score",
"_id": "1",
"_score": 1,
"_source": {
"id": 1,
"location": [
"bangalore",
"paris",
"mumbai"
]
}
},
{
"_index": "test_city",
"_type": "city_score",
"_id": "6",
"_score": 1,
"_source": {
"location": [
"bangalore",
"paris",
"mumbai",
"calicut"
]
}
},
{
"_index": "test_city",
"_type": "city_score",
"_id": "2",
"_score": 1,
"_source": {
"id": 2,
"location": [
"bangalore",
"paris"
]
}
},
{
"_index": "test_city",
"_type": "city_score",
"_id": "7",
"_score": 1,
"_source": {
"location": [
"bangalore",
"paris",
"bangalore",
"bangalore"
]
}
}
]
}
}

There's something strange in your result all your _score are equal to 1???
in my test the score are different : _score":0.581694", "_score":0.09494676!!

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-constant-score-query.html

I have used constant_query which says it'll ignore tf-idf and consider only coord.

I need to ignore this tf-idf some how and consider only coord.