Inconsistent Results/Doc Scores on Query

I'm running the same query back to back on an index and the score and results come back different when I look at explain:

                                                       {
                                                           “value”: 22037,
                                                           “description”: “n, number of documents containing term”,
                                                           “details”: []
                                                       },
                                                       {
                                                           “value”: 23498,
                                                           “description”: “N, total number of documents with field”,
                                                           “details”: []
                                                       }

vs:

    “details”: [
                                                            {
                                                                “value”: 22395,
                                                                “description”: “n, number of documents containing term”,
                                                                “details”: []
                                                            },
                                                            {
                                                                “value”: 23744,
                                                                “description”: “N, total number of documents with field”,
                                                                “details”: []
                                                            }

What should these represent values? I have ~81k documents containing that field. I tried using an API refresh and the results are still returning mixed.

How can I maintain consistency with this, shouldn't it be caching these to some extent, or is that potentially a part of the problem?

If you have replicas you may be round-robining between them.
Although they may be identical in terms of documents they hold, they reorganise that content via background housekeeping (aka merges) at different times. Merging will purge any deletes and these can change the scores produced. IDF or inverse-document-frequency is a scoring factor that is based on a term's popularity and that popularity count includes deleted docs that have not yet been removed.

How can I maintain consistency
Generally, sending the same users to the same choice of replica helps avoid these subtle differences in scoring. This can be done using the preference parameter and setting it to something like the user's session ID.

After a week + of inconsistency, we did force the merge.

Doing a forcemerge fixed the problem. Thank you.

/_forcemerge/?only_expunge_deletes=true

Is there a way to easily identify that a _forcemerge may be necessary on an index?

What are the risks in manually forcing this merge?

We only advise force merging if you are adding no more docs to an index. It's an expensive operation - essentially rewriting the index files all over again. If you have ongoing indexing it's constantly merging anyway to help defragment the data.
If you have ongoing changes it's generally better to route users to the same choice of replica to avoid these scoring changes.