We observed the following issue in production.
QUERY: Deep pagination with date filter and sorting on a particular field for an index. Index has two replicas (3 copies total) and 1 primary shard. Total shards = 3. ES version 5.1.1
ISSUE: Duplicates returned cross pages.
e.g. If page size is 100, Page 1 and Page 2 will return common documents. We can reproduce the issue on and off, which tells me that each shard returns result in different sorting order. Please note that there were no documents added to the index during this time.
QUERY for Page 40:
{
"from" : 4000,
"size" : 100,
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : [
{
"range" : {
"endDate" : {
"from" : "2016-01-01T00:00:00.000Z",
"to" : "2017-01-01T00:00:00.000Z",
"include_lower" : true,
"include_upper" : true,
"boost" : 1.0
}
}
},
{
"bool" : {
"must" : [
{
"range" : {
"indexDate" : {
"from" : null,
"to" : null,
"include_lower" : true,
"include_upper" : true,
"boost" : 1.0
}
}
},
{
"bool" : {
"must_not" : [
{
"exists" : {
"field" : "mandatoryField",
"boost" : 1.0
}
}
],
"disable_coord" : false,
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
],
"disable_coord" : false,
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
],
"disable_coord" : false,
"adjust_pure_negative" : true,
"boost" : 1.0
}
},
"boost" : 1.0
}
},
"_source" : {
"includes" : [
"documentId"
],
"excludes" : [ ]
},
"sort" : [
{
"recordDate" : {
"order" : "asc"
}
}
],
"ext" : { }
}
Query for Page 41:
{
"from" : 4100,
"size" : 100,
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : [
{
"range" : {
"endDate" : {
"from" : "2016-01-01T00:00:00.000Z",
"to" : "2017-01-01T00:00:00.000Z",
"include_lower" : true,
"include_upper" : true,
"boost" : 1.0
}
}
},
{
"bool" : {
"must" : [
{
"range" : {
"indexDate" : {
"from" : null,
"to" : null,
"include_lower" : true,
"include_upper" : true,
"boost" : 1.0
}
}
},
{
"bool" : {
"must_not" : [
{
"exists" : {
"field" : "mandatoryField",
"boost" : 1.0
}
}
],
"disable_coord" : false,
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
],
"disable_coord" : false,
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
],
"disable_coord" : false,
"adjust_pure_negative" : true,
"boost" : 1.0
}
},
"boost" : 1.0
}
},
"_source" : {
"includes" : [
"documentId"
],
"excludes" : [ ]
},
"sort" : [
{
"recordDate" : {
"order" : "asc"
}
}
],
"ext" : { }
}
Response form page 40 and 41 will contain common documents. We had to run this query in a loop to reproduce it.
WHAT WE TRIED:
- After reducing the number of replicas to 0, we couldn't reproduce the issue.
- Then we increased the number of replicas to 2 and we still couldn't reproduce the issue.
This leads us to believe that somehow each shard has some discrepancy when an index is built over a period of time with more than 1 replica. Please let me know if this is a known issue because this is pretty serious bug.