Unexpected scores when doing a fuzzy search


(Mike Lewin) #1

Hello, I am new to elasticsearch and I've found that fuzzy searches do not
return results in the expected order (which in my example below would be
the 2 Jon Smyths having the joint highest score). Other strange behaviour I
have noticed:

  • identical documents have different scores
  • re-running the script below (dropping and reimporting the data) gives
    different scores each time
  • running the search query with ?pretty=true, it says that every result has
    score=1

Can anybody help?

thanks
Mike

elasticsearch-0.90.3
java 1.6.0_24
OS: Centos 6
1 node with 3 very small indices
gateway: local

script:
#! /bin/sh

set -x

echo "\ndeleting table"
curl -XDELETE http://localhost:9200/test/
echo "\ncreating table"
curl -XPOST http://localhost:9200/test/ -d '
{
"mappings": {
"people": {
"properties": {
"name": {
"type": "string"
},
"age": {
"type": "integer"
}

  }
}

}
}'
echo "\nadding document"
curl -XPOST http://localhost:9200/test/people -d '
{
"name": "John Smith",
"age": "21"
}'
echo "\nadding document"
curl -XPOST http://localhost:9200/test/people -d '
{
"name": "John Smith",
"age": "21"
}'
echo "\nadding document"
curl -XPOST http://localhost:9200/test/people -d '
{
"name": "Jon Smyth",
"age": "22"
}'
echo "\nadding document"
curl -XPOST http://localhost:9200/test/people -d '
{
"name": "Jon Smyth",
"age": "22"
}'
echo "\nadding document"
curl -XPOST http://localhost:9200/test/people -d '
{
"name": "Jo Smith",
"age": "23"
}'
echo "\nadding document"
curl -XPOST http://localhost:9200/test/people -d '
{
"name": "Jo",
"age": "23"
}'
echo "\nrefreshing index"
curl -XPOST 'http://localhost:9200/test/_refresh'
echo "\nquerying data"
curl -XGET 'http://localhost:9200/test/people/_search?q=Jon~'

sample output:

{"took":9,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":6,"max_score":0.4,"hits":
[{"_index":"test","_type":"people","_id":"MlAA7J7GR-CmYPwH9v4rbg","_score":0.4,
"_source" :
{
"name": "John Smith",
"age": "21"
}},{"_index":"test","_type":"people","_id":"gPFQyQB5QmuiIHo5PjUkGQ","_score":0.375,
"_source" :
{
"name": "Jo",
"age": "23"
}},{"_index":"test","_type":"people","_id":"sI77qIhQTw-QfisxBxEbbw","_score":0.15342641,
"_source" :
{
"name": "Jon Smyth",
"age": "22"
}},{"_index":"test","_type":"people","_id":"lHr29sSnQl6CfzIegDF6zA","_score":0.15342641,
"_source" :
{
"name": "Jon Smyth",
"age": "22"
}},{"_index":"test","_type":"people","_id":"wNl6oNeMRGG4LIo3svsHIA","_score":0.15342641,
"_source" :
{
"name": "Jo Smith",
"age": "23"
}},{"_index":"test","_type":"people","_id":"K0jHhzo5RQONwZnmwaI0tg","_score":0.1534264,
"_source" :
{
"name": "John Smith",
"age": "21"
}}]}}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ee952578-55b1-49f0-995b-d16a62ff2c53%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Mike Lewin) #2

Just to update this - I looked around some more and found some related
posts:
https://groups.google.com/forum/?fromgroups#!searchin/elasticsearch/fuzzy/elasticsearch/IqwZLJHDoB4/U04G6Q7o_3UJ
https://groups.google.com/forum/?fromgroups#!searchin/elasticsearch/fuzzy/elasticsearch/KKeXEo2LiCo/8IQkqqq7NRYJ

I did find that if I enter multiple search terms ["Jon~2","Jon~1","Jon"]
then the scores look more like what I'd expect.

It seems like the advice from Kimchy was to use a rewrite method but I'm
not sure what's the correct syntax to make that work. I added "rewrite":
"top_terms_100" after "query": "Jon~" but it had no effect.

I'd be grateful to hear people's opinions on the other bugs I mentioned, in
particular:

  • identical documents have different scores
  • re-running the script below (dropping and reimporting the data) gives
    different scores each time

Can anyone reproduce these bugs with the script I provided?

thanks
Mike

On Thursday, December 5, 2013 2:05:15 AM UTC, Mike Lewin wrote:

Hello, I am new to elasticsearch and I've found that fuzzy searches do not
return results in the expected order (which in my example below would be
the 2 Jon Smyths having the joint highest score). Other strange behaviour I
have noticed:

  • identical documents have different scores
  • re-running the script below (dropping and reimporting the data) gives
    different scores each time
  • running the search query with ?pretty=true, it says that every result
    has score=1

Can anybody help?

thanks
Mike

elasticsearch-0.90.3
java 1.6.0_24
OS: Centos 6
1 node with 3 very small indices
gateway: local

script:
#! /bin/sh

set -x

echo "\ndeleting table"
curl -XDELETE http://localhost:9200/test/
echo "\ncreating table"
curl -XPOST http://localhost:9200/test/ -d '
{
"mappings": {
"people": {
"properties": {
"name": {
"type": "string"
},
"age": {
"type": "integer"
}

  }
}

}
}'
echo "\nadding document"
curl -XPOST http://localhost:9200/test/people -d '
{
"name": "John Smith",
"age": "21"
}'
echo "\nadding document"
curl -XPOST http://localhost:9200/test/people -d '
{
"name": "John Smith",
"age": "21"
}'
echo "\nadding document"
curl -XPOST http://localhost:9200/test/people -d '
{
"name": "Jon Smyth",
"age": "22"
}'
echo "\nadding document"
curl -XPOST http://localhost:9200/test/people -d '
{
"name": "Jon Smyth",
"age": "22"
}'
echo "\nadding document"
curl -XPOST http://localhost:9200/test/people -d '
{
"name": "Jo Smith",
"age": "23"
}'
echo "\nadding document"
curl -XPOST http://localhost:9200/test/people -d '
{
"name": "Jo",
"age": "23"
}'
echo "\nrefreshing index"
curl -XPOST 'http://localhost:9200/test/_refresh'
echo "\nquerying data"
curl -XGET 'http://localhost:9200/test/people/_search?q=Jon~'

sample output:

{"took":9,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":6,"max_score":0.4,"hits":
[{"_index":"test","_type":"people","_id":"MlAA7J7GR-CmYPwH9v4rbg","_score":0.4,
"_source" :
{
"name": "John Smith",
"age": "21"
}},{"_index":"test","_type":"people","_id":"gPFQyQB5QmuiIHo5PjUkGQ","_score":0.375,
"_source" :
{
"name": "Jo",
"age": "23"
}},{"_index":"test","_type":"people","_id":"sI77qIhQTw-QfisxBxEbbw","_score":0.15342641,
"_source" :
{
"name": "Jon Smyth",
"age": "22"
}},{"_index":"test","_type":"people","_id":"lHr29sSnQl6CfzIegDF6zA","_score":0.15342641,
"_source" :
{
"name": "Jon Smyth",
"age": "22"
}},{"_index":"test","_type":"people","_id":"wNl6oNeMRGG4LIo3svsHIA","_score":0.15342641,
"_source" :
{
"name": "Jo Smith",
"age": "23"
}},{"_index":"test","_type":"people","_id":"K0jHhzo5RQONwZnmwaI0tg","_score":0.1534264,
"_source" :
{
"name": "John Smith",
"age": "21"
}}]}}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e7deecfa-6e43-470d-a41b-32eb820a7fed%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #3