ES not returning expected documents when using sort

Using a simple {"sort":[{"first_name":"asc"}], "size":20, "from":0}} is
returning inconsistent results.

When i rebuild the index, everything is returned correctly. However, as
soon as i update a document, it no longer gets returned (using 0.90.0
stable).

Here is a full gist of problem and mapping:

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

hey, does this also happen if you use the _id as a secondary sort? If you
update the document and all docs have the same sort value it might depend
on what shard returns first. Can you try this as a tie-breaker?

On Wednesday, May 22, 2013 11:48:53 PM UTC+2, Aldo Sarmiento wrote:

Using a simple {"sort":[{"first_name":"asc"}], "size":20, "from":0}} is
returning inconsistent results.

When i rebuild the index, everything is returned correctly. However, as
soon as i update a document, it no longer gets returned (using 0.90.0
stable).

Here is a full gist of problem and mapping:

https://gist.github.com/sarmiena/d945848fd683f39d212c

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I've updated my query to be:

curl -X GET
'http://localhost:9200/development::application-contacts/contact/_search?pretty'
-d '
{
"sort": [
{
"first_name_sort": "asc"
},
"_id"
],
"size": 20,
"from": 0
}

However, the record still doesn't get returned. I wonder if there is
something wrong with the mapping. Not sure if that would affect the sort
order, though.

On Thursday, May 23, 2013 12:24:39 AM UTC-7, simonw wrote:

hey, does this also happen if you use the _id as a secondary sort? If you
update the document and all docs have the same sort value it might depend
on what shard returns first. Can you try this as a tie-breaker?

On Wednesday, May 22, 2013 11:48:53 PM UTC+2, Aldo Sarmiento wrote:

Using a simple {"sort":[{"first_name":"asc"}], "size":20, "from":0}} is
returning inconsistent results.

When i rebuild the index, everything is returned correctly. However, as
soon as i update a document, it no longer gets returned (using 0.90.0
stable).

Here is a full gist of problem and mapping:

https://gist.github.com/sarmiena/d945848fd683f39d212c

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ok i've really stripped down the problem and made narrowed down the
problem:

Problem

1
2

elasticsearch sorting while using "size" and "from" not returning expected results after update


Create some records (using Ruby to create 100 records)

1
2
3
4

(1).upto(100) do |i|
curl -XPUT 'http://localhost:9200/twitter/tweet/#{i}' -d '{ "user" : "#{i}"}'end


View initial search/sort results

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86

curl -X GET 'http://localhost:9200/twitter/tweet/_search?pretty' -d '
{
"sort": [
{
"user": "asc"
}
],
"size": 10,
"from": 0
}
'

{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 100,
"max_score" : null,
"hits" : [ {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "1",
"_score" : null, "_source" : { "user" : "1"},
"sort" : [ "1" ]
}, {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "10",
"_score" : null, "_source" : { "user" : "10"},
"sort" : [ "10" ]
}, {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "100",
"_score" : null, "_source" : { "user" : "100"},
"sort" : [ "100" ]
}, {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "11",
"_score" : null, "_source" : { "user" : "11"},
"sort" : [ "11" ]
}, {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "12",
"_score" : null, "_source" : { "user" : "12"},
"sort" : [ "12" ]
}, {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "13",
"_score" : null, "_source" : { "user" : "13"},
"sort" : [ "13" ]
}, {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "14",
"_score" : null, "_source" : { "user" : "14"},
"sort" : [ "14" ]
}, {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "15",
"_score" : null, "_source" : { "user" : "15"},
"sort" : [ "15" ]
}, {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "16",
"_score" : null, "_source" : { "user" : "16"},
"sort" : [ "16" ]
}, {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "17",
"_score" : null, "_source" : { "user" : "17"},
"sort" : [ "17" ]
} ]
}


Now let's update the first record

1
2
3
4

curl -X POST "http://localhost:9200/twitter/tweet/1" -d '
{"user":"1"}
'


And... for the problem (notice that previous first record is now missing)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85

curl -X GET 'http://localhost:9200/twitter/tweet/_search?pretty' -d '
{
"sort": [
{
"user": "asc"
}
],
"size": 10,
"from": 0
}
'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 100,
"max_score" : null,
"hits" : [ {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "10",
"_score" : null, "_source" : { "user" : "10"},
"sort" : [ "10" ]
}, {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "100",
"_score" : null, "_source" : { "user" : "100"},
"sort" : [ "100" ]
}, {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "11",
"_score" : null, "_source" : { "user" : "11"},
"sort" : [ "11" ]
}, {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "12",
"_score" : null, "_source" : { "user" : "12"},
"sort" : [ "12" ]
}, {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "13",
"_score" : null, "_source" : { "user" : "13"},
"sort" : [ "13" ]
}, {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "14",
"_score" : null, "_source" : { "user" : "14"},
"sort" : [ "14" ]
}, {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "15",
"_score" : null, "_source" : { "user" : "15"},
"sort" : [ "15" ]
}, {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "16",
"_score" : null, "_source" : { "user" : "16"},
"sort" : [ "16" ]
}, {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "17",
"_score" : null, "_source" : { "user" : "17"},
"sort" : [ "17" ]
}, {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "18",
"_score" : null, "_source" : { "user" : "18"},
"sort" : [ "18" ]
} ]
}

On Thursday, May 23, 2013 1:08:50 AM UTC-7, Aldo Sarmiento wrote:

I've updated my query to be:

curl -X GET '
http://localhost:9200/development::application-contacts/contact/_search?pretty'
-d '
{
"sort": [
{
"first_name_sort": "asc"
},
"_id"
],
"size": 20,
"from": 0
}

However, the record still doesn't get returned. I wonder if there is
something wrong with the mapping. Not sure if that would affect the sort
order, though.

On Thursday, May 23, 2013 12:24:39 AM UTC-7, simonw wrote:

hey, does this also happen if you use the _id as a secondary sort? If you
update the document and all docs have the same sort value it might depend
on what shard returns first. Can you try this as a tie-breaker?

On Wednesday, May 22, 2013 11:48:53 PM UTC+2, Aldo Sarmiento wrote:

Using a simple {"sort":[{"first_name":"asc"}], "size":20, "from":0}} is
returning inconsistent results.

When i rebuild the index, everything is returned correctly. However, as
soon as i update a document, it no longer gets returned (using 0.90.0
stable).

Here is a full gist of problem and mapping:

https://gist.github.com/sarmiena/d945848fd683f39d212c

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Simon

You can't sort on _id out of the box, as it is not indexed.

Clint

On 23 May 2013 09:24, simonw simon.willnauer@elasticsearch.com wrote:

hey, does this also happen if you use the _id as a secondary sort? If you
update the document and all docs have the same sort value it might depend
on what shard returns first. Can you try this as a tie-breaker?

On Wednesday, May 22, 2013 11:48:53 PM UTC+2, Aldo Sarmiento wrote:

Using a simple {"sort":[{"first_name":"asc"}]**, "size":20, "from":0}}
is returning inconsistent results.

When i rebuild the index, everything is returned correctly. However, as
soon as i update a document, it no longer gets returned (using 0.90.0
stable).

Here is a full gist of problem and mapping:

https://gist.github.com/**sarmiena/d945848fd683f39d212chttps://gist.github.com/sarmiena/d945848fd683f39d212c

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Aldo

This looks like a bug. I've opened an issue

clint

On 23 May 2013 11:23, Clinton Gormley clint@traveljury.com wrote:

Hi Simon

You can't sort on _id out of the box, as it is not indexed.

Clint

On 23 May 2013 09:24, simonw simon.willnauer@elasticsearch.com wrote:

hey, does this also happen if you use the _id as a secondary sort? If you
update the document and all docs have the same sort value it might depend
on what shard returns first. Can you try this as a tie-breaker?

On Wednesday, May 22, 2013 11:48:53 PM UTC+2, Aldo Sarmiento wrote:

Using a simple {"sort":[{"first_name":"asc"}]**, "size":20, "from":0}}
is returning inconsistent results.

When i rebuild the index, everything is returned correctly. However, as
soon as i update a document, it no longer gets returned (using 0.90.0
stable).

Here is a full gist of problem and mapping:

https://gist.github.com/**sarmiena/d945848fd683f39d212chttps://gist.github.com/sarmiena/d945848fd683f39d212c

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.