Duplicate results when paging

Hello Folks --

I'm researching an interesting situation that one of our users pointed out. We have a somewhat simple query that searches on a couple fields. We are using size/from to allow the users to step through the pages. Some times the last page of results contains results that were included in a previous page. I turned on explain for the query and I see that in those cases the results are from different nodes:

    {
        "_shard": "[index-changed][1]",
        "_node": "PKznqHx5QfCRke-rsxzn-w",
        "_index": "index-changed",
        "_type": "_doc",
        "_id": "633637",
        "_score": 12.777646,

and

    {
        "_shard": "[index-changed][1]",
        "_node": "TzNHbXfHSvulWCJP6pXKow",
        "_index": "index-changed",
        "_type": "_doc",
        "_id": "633637",
        "_score": 12.64538

I understand that this is somewhat expected, and that I could use the preference query param to use the same shards. However it doesn't seem to make a difference, I still get results from both nodes, resulting in slightly different scores. What am I missing? How can I eliminate duplicates, and get consistent results when paging?

Thanks!

EDIT:
Here's the query:

{
    "query": {
        "bool": {
            "must": {
                "multi_match": {
                    "query": "search text",
                    "fields": [
                        "field1",
                        "field1.english",
                        "field2^0.1"
                    ],
                    "type": "phrase",
                    "slop": 10
                }
            },
            "must_not": [
                {
                    "range": {
                        "ends_at": {
                            "lt": "2021-01-12T17:12:44.107Z"
                        }
                    }
                }
            ],
            "should": [
                {
                    "distance_feature": {
                        "field": "updated_at",
                        "pivot": "90d",
                        "origin": "now",
                        "boost": 0.5
                    }
                },
                {
                    "terms": {
                        "type": [
                            "TYPE2",
                            "TYPE2",
                            "TYPE3"
                        ],
                        "boost": 0.5
                    }
                }
            ],
            "filter": [
                {
                    "term": {
                        "type": {
                            "value": "TYPE1"
                        }
                    }
                }
            ]
        }
    },
    "size": 11,
    "from": 30
}

Is the index static, in that there are no new documents or changes coming into it

No. Documents could be added to the index while we're searching. I'm not sure if any were added when I was testing. What happens if it is static vs not static?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.