Elasticsearch probability depends on document weight


(Dmitry Ermichev) #1

I want to make query, that returns random documents with probability of each document depends on it weight. In my index each document has weight from 1 to N. So element with weight 1 must appears in result 2 times less, than document with weight 2.
For example I have 3 documents (one with weight 2, two with weight 1). So probability of each document appearance must be 50%, 25%, 25%. Example:

[
            {
                "_index": "we_recommend_on_main",
                "_type": "we_recommend_on_main",
                "_id": "5-0",
                "_score": 1.1245852,
                "_source": {
                    "id_map_placement": 6151,
                    "image": "/upload/banner1",
                    "weight": 2
                }
            },
            {
                "_index": "we_recommend_on_main",
                "_type": "we_recommend_on_main",
                "_id": "8-0",
                "_score": 0.14477867,
                "_source": {
                    "id_map_placement": 6151,
                    "image": "/upload/banner1",
                    "weight": 1
                }
            },
            {
                "_index": "we_recommend_on_main",
                "_type": "we_recommend_on_main",
                "_id": "8-1",
                "_score": 0.0837487,
                "_source": {
                    "id_map_placement": 6151,
                    "image": "/upload/banner2",
                    "weight": 1
                }
            }
        ]

I found the solution with search like this:

{
    "size": 1,
    "query": {
        "function_score": {
            "functions": [
                {
                    "random_score": {}
                },
                {
                    "field_value_factor": {
                        "field": "weight",
                        "modifier": "none",
                        "missing": 1
                    }
                }
            ],
            "score_mode": "multiply",
            "boost_mode": "replace"
        }
    },
    "sort": [
        {
            "_score": "desc"
        }
    ]
}

That solution was in issue https://github.com/elastic/elasticsearch/issues/7783#issuecomment-64880008, but it doesn't work as I expected.
After i tested this query with 10000 times result is

{
        "5-0": 6730,
        "8-1": 1613,
        "8-0": 1657
    }

But not

{
        "5-0": 5000,
        "8-1": 2500,
        "8-0": 2500
    }

I asked this question on stackoverflow, and get answer, that my assumption about probability is wrong https://stackoverflow.com/a/54133955/1848278I understand this, but can't find what query I need to get needed results. Please help me.


(system) closed #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.