Order by name doesn't work as expected


(Nikolay Chankov) #1

Hi guys,

for some reason, the order by name, _score is not working as I would expect.
I've prepared a simple example to explain what I mean.
There are 2 records: john doe and jane doe. if there is no email in the
index their score is the same, and the order is correct, jane goes before
john, but if john's record has email which contain doe (the search phrase),
john _score is higher and the order is wrong.
I've noticed that in the results the "sort" node is [ "doe", 0.6328839 ], [
"doe", 0.48819983 ] rather than [ "john doe", 0.6328839 ], [ "jane doe",
0.48819983 ]. if the order is name:desc the search is [ "jane", 0.6328839
], [ "john", 0.48819983 ]

This happen when I use query:{...}. If the query is missing the results get
the same weight and it is working as expected.

do I need to make special sort somehow in order to get the desired order,
or it's a bug?

Thanks in advance.

Here is the script how to see this behavior. I am using 0.90.5 if it does
matter (tested 0.90.8 with the same effect). BTW, if the name is without a
space e.g. johndoe, janedoe the order is correct.

curl -XDELETE 'http://localhost:9200/test_search'
curl -XPUT 'http://localhost:9200/test_search/' -d '
{
"mappings" : {
"record" : {
"properties" : {
"object" : {
"type" : "string"
},
"id" : {
"type" : "integer"
},
"name" : {
"type" : "string",
"boost" : 6
},
"email" : {
"type" : "string",
"boost" : 5
}
}
}
}

}
'
curl -XPUT 'http://localhost:9200/test_search/record/1' -d '{
"object" : "User",
"id" : 1,
"name" : "john doe",
"email" : "doe@doe.com"
}'
curl -XPUT 'http://localhost:9200/test_search/record/2' -d '{
"object" : "User",
"id" : 2,
"name" : "jane doe",
"email" : "j@d.com"
}'

curl -XGET 'http://localhost:9200/test_search/_search?pretty=true' -d
'{"query":{"filtered":{"query":{"queryString":{"query":"doe"}}}},"sort":[{"name":"asc"},"_score"]}'

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f8fbba45-186a-42b2-86be-970ef65e60a5%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Ivan Brusic) #2

The "name" field is an analyzed string field, so it can generate numerous
tokens depending on the text. In your case, the email is being split into
two tokens:

http://localhost:9200/_analyze?text=john%20doe

{"tokens": [{"token": "john","start_offset": 0,"end_offset": 4,"type": "
","position": 1},{"token": "doe","start_offset": 5,"end_offset": 8
,"type": "","position": 2}]}

Very likely, the sort is using sorting using the "doe" token to sort,
making the two documents equal. Their equality is the reason why to order
changes, since something else is breaking the tie (or random). Sorting does
not work well on analyzed fields, so you should either:

  1. set the field as not_analyzed which might affect searching against it
  2. use a keyword tokenizer. Same as above, you can apply more filters such
    as lowecase and ascii/icu folding
  3. Use the multi field type [1]. This type will allow you to output
    different fields: one to search one and one to sort on.

Also note that scoring is disabled when you sort against a field. You would
need to enable score traking. [2]

[1]
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-multi-field-type.html
[2]
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-sort.html#_track_scores

Cheers,

Ivan

On Thu, Jan 2, 2014 at 8:20 AM, Nikolay Chankov nchankov@gmail.com wrote:

Hi guys,

for some reason, the order by name, _score is not working as I would
expect.
I've prepared a simple example to explain what I mean.
There are 2 records: john doe and jane doe. if there is no email in the
index their score is the same, and the order is correct, jane goes before
john, but if john's record has email which contain doe (the search phrase),
john _score is higher and the order is wrong.
I've noticed that in the results the "sort" node is [ "doe", 0.6328839
], [ "doe", 0.48819983 ] rather than [ "john doe", 0.6328839 ], [ "jane
doe", 0.48819983 ]. if the order is name:desc the search is [ "jane",
0.6328839 ], [ "john", 0.48819983 ]

This happen when I use query:{...}. If the query is missing the results
get the same weight and it is working as expected.

do I need to make special sort somehow in order to get the desired order,
or it's a bug?

Thanks in advance.

Here is the script how to see this behavior. I am using 0.90.5 if it does
matter (tested 0.90.8 with the same effect). BTW, if the name is without a
space e.g. johndoe, janedoe the order is correct.

curl -XDELETE 'http://localhost:9200/test_search'
curl -XPUT 'http://localhost:9200/test_search/' -d '
{
"mappings" : {
"record" : {
"properties" : {
"object" : {
"type" : "string"
},
"id" : {
"type" : "integer"
},
"name" : {
"type" : "string",
"boost" : 6
},
"email" : {
"type" : "string",
"boost" : 5
}
}
}
}

}
'
curl -XPUT 'http://localhost:9200/test_search/record/1' -d '{
"object" : "User",
"id" : 1,
"name" : "john doe",
"email" : "doe@doe.com"
}'
curl -XPUT 'http://localhost:9200/test_search/record/2' -d '{
"object" : "User",
"id" : 2,
"name" : "jane doe",
"email" : "j@d.com"
}'

curl -XGET 'http://localhost:9200/test_search/_search?pretty=true' -d
'{"query":{"filtered":{"query":{"queryString":{"query":"doe"}}}},"sort":[{"name":"asc"},"_score"]}'

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f8fbba45-186a-42b2-86be-970ef65e60a5%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBePZArZCSFnjUFS25rKtGLkcECA36c%3DJ-wYSxpXXkY_w%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Nikolay Chankov) #3

Thank you, it's working.

On Thursday, January 2, 2014 4:20:17 PM UTC, Nikolay Chankov wrote:

Hi guys,

for some reason, the order by name, _score is not working as I would
expect.
I've prepared a simple example to explain what I mean.
There are 2 records: john doe and jane doe. if there is no email in the
index their score is the same, and the order is correct, jane goes before
john, but if john's record has email which contain doe (the search phrase),
john _score is higher and the order is wrong.
I've noticed that in the results the "sort" node is [ "doe", 0.6328839
], [ "doe", 0.48819983 ] rather than [ "john doe", 0.6328839 ], [ "jane
doe", 0.48819983 ]. if the order is name:desc the search is [ "jane",
0.6328839 ], [ "john", 0.48819983 ]

This happen when I use query:{...}. If the query is missing the results
get the same weight and it is working as expected.

do I need to make special sort somehow in order to get the desired order,
or it's a bug?

Thanks in advance.

Here is the script how to see this behavior. I am using 0.90.5 if it does
matter (tested 0.90.8 with the same effect). BTW, if the name is without a
space e.g. johndoe, janedoe the order is correct.

curl -XDELETE 'http://localhost:9200/test_search'
curl -XPUT 'http://localhost:9200/test_search/' -d '
{
"mappings" : {
"record" : {
"properties" : {
"object" : {
"type" : "string"
},
"id" : {
"type" : "integer"
},
"name" : {
"type" : "string",
"boost" : 6
},
"email" : {
"type" : "string",
"boost" : 5
}
}
}
}

}
'
curl -XPUT 'http://localhost:9200/test_search/record/1' -d '{
"object" : "User",
"id" : 1,
"name" : "john doe",
"email" : "doe@doe.com"
}'
curl -XPUT 'http://localhost:9200/test_search/record/2' -d '{
"object" : "User",
"id" : 2,
"name" : "jane doe",
"email" : "j@d.com"
}'

curl -XGET 'http://localhost:9200/test_search/_search?pretty=true' -d
'{"query":{"filtered":{"query":{"queryString":{"query":"doe"}}}},"sort":[{"name":"asc"},"_score"]}'

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ae51e048-42d0-4ad6-a5f3-710063521959%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #4