Hello Folks --
I'm researching an interesting situation that one of our users pointed out. We have a somewhat simple query that searches on a couple fields. We are using size/from to allow the users to step through the pages. Some times the last page of results contains results that were included in a previous page. I turned on explain for the query and I see that in those cases the results are from different nodes:
{
"_shard": "[index-changed][1]",
"_node": "PKznqHx5QfCRke-rsxzn-w",
"_index": "index-changed",
"_type": "_doc",
"_id": "633637",
"_score": 12.777646,
and
{
"_shard": "[index-changed][1]",
"_node": "TzNHbXfHSvulWCJP6pXKow",
"_index": "index-changed",
"_type": "_doc",
"_id": "633637",
"_score": 12.64538
I understand that this is somewhat expected, and that I could use the preference query param to use the same shards. However it doesn't seem to make a difference, I still get results from both nodes, resulting in slightly different scores. What am I missing? How can I eliminate duplicates, and get consistent results when paging?
Thanks!
EDIT:
Here's the query:
{
"query": {
"bool": {
"must": {
"multi_match": {
"query": "search text",
"fields": [
"field1",
"field1.english",
"field2^0.1"
],
"type": "phrase",
"slop": 10
}
},
"must_not": [
{
"range": {
"ends_at": {
"lt": "2021-01-12T17:12:44.107Z"
}
}
}
],
"should": [
{
"distance_feature": {
"field": "updated_at",
"pivot": "90d",
"origin": "now",
"boost": 0.5
}
},
{
"terms": {
"type": [
"TYPE2",
"TYPE2",
"TYPE3"
],
"boost": 0.5
}
}
],
"filter": [
{
"term": {
"type": {
"value": "TYPE1"
}
}
}
]
}
},
"size": 11,
"from": 30
}