Ignore relevance score completely by using post_filter only

Hi,

In my product, i don't want to use relevance score in my search queries. I have my own sorting algorithm once i have results.

On searching i found that to avoid scoring, use post_filter insread of query clause.

Here i want to get to know the side effects of using post_filter directly.

Please discuss the problems with me.

Thanks :slight_smile:

Hi Kshitij,
Post-filter is about filtering not scoring or sorting.
It's usually used so that aggregations get an unfiltered view of results (eg to show all the available product colours) but the stream of top-matching results returned as hits is then filtered by aggregation choices (like colour must = red). This way customers can see the count of alternative colour choices if they want to add them to the query.

Check out the constant_score query as a wrapper for your search clause.

It's often best to try do this stuff "in-line" rather than post-processing elasticsearch results. The function score query is designed to support custom scoring in-line.

Hi Mark,

Thanks for suggestion but constant_score query doesn't help me here because if i have multiple bool clauses which doesn't give me right results and also multiple combination of should, must not worked because query is not supported in constant_score.

Please suggest me some other api or logic.

Bye and thanks :slight_smile:

I’m not sure I understand your response. I may need to see some JSON

Hi Mark,

GET cs180esacs18_export_pdmarticle_1/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "Label": "player"
          }
        }
      ],
      "should": [
        {
          "match": {
            "ExternalKey": "CS-77"
          }
        }
      ]
    }
  },
  "_source": "false"
}

Gives following result
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 9,
"max_score": 8.017768,
"hits": [
{
"_index": "cs180esacs18_export_pdmarticle_1",
"_type": "item",
"_id": "10241",
"_score": 8.017768,
"_source": {}
},
{
"_index": "cs180esacs18_export_pdmarticle_1",
"_type": "item",
"_id": "77",
"_score": 7.8447104,
"_source": {}
},
{
"_index": "cs180esacs18_export_pdmarticle_1",
"_type": "item",
"_id": "10255",
"_score": 4.8923726,
"_source": {}
},
{
"_index": "cs180esacs18_export_pdmarticle_1",
"_type": "item",
"_id": "62",
"_score": 4.7425256,
"_source": {}
},
{
"_index": "cs180esacs18_export_pdmarticle_1",
"_type": "item",
"_id": "67",
"_score": 4.6269364,
"_source": {}
},
{
"_index": "cs180esacs18_export_pdmarticle_1",
"_type": "item",
"_id": "76",
"_score": 4.017816,
"_source": {}
},
{
"_index": "cs180esacs18_export_pdmarticle_1",
"_type": "item",
"_id": "79",
"_score": 3.5518925,
"_source": {}
},
{
"_index": "cs180esacs18_export_pdmarticle_1",
"_type": "item",
"_id": "78",
"_score": 3.4534197,
"_source": {}
},
{
"_index": "cs180esacs18_export_pdmarticle_1",
"_type": "item",
"_id": "81",
"_score": 3.382641,
"_source": {}
}
]
}
}

But constant_score query
GET cs180esacs18_export_pdmarticle_1/_search
{
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"match": {
"Label": "player"
}
}
],
"should": [
{
"match": {
"ExternalKey": "CS-77"
}
}
]
}
},
"boost": 1.2
}
},
"_source": "false"
}

gives following result
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1.2,
"hits": [
{
"_index": "cs180esacs18_export_pdmarticle_1",
"_type": "item",
"_id": "10241",
"_score": 1.2,
"_source": {}
},
{
"_index": "cs180esacs18_export_pdmarticle_1",
"_type": "item",
"_id": "77",
"_score": 1.2,
"_source": {}
}
]
}
}

Btw post_filter query -
GET cs180esacs18_export_pdmarticle_1/_search
{
"post_filter": {
"bool": {
"must": [
{
"match": {
"Label": "player"
}
}
],
"should": [
{
"match": {
"ExternalKey": "CS-77"
}
}
]
}
},
"_source": "false"
}

Gives following result :slight_smile:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 9,
"max_score": 1,
"hits": [
{
"_index": "cs180esacs18_export_pdmarticle_1",
"_type": "item",
"_id": "79",
"_score": 1,
"_source": {}
},
{
"_index": "cs180esacs18_export_pdmarticle_1",
"_type": "item",
"_id": "10255",
"_score": 1,
"_source": {}
},
{
"_index": "cs180esacs18_export_pdmarticle_1",
"_type": "item",
"_id": "10241",
"_score": 1,
"_source": {}
},
{
"_index": "cs180esacs18_export_pdmarticle_1",
"_type": "item",
"_id": "78",
"_score": 1,
"_source": {}
},
{
"_index": "cs180esacs18_export_pdmarticle_1",
"_type": "item",
"_id": "62",
"_score": 1,
"_source": {}
},
{
"_index": "cs180esacs18_export_pdmarticle_1",
"_type": "item",
"_id": "81",
"_score": 1,
"_source": {}
},
{
"_index": "cs180esacs18_export_pdmarticle_1",
"_type": "item",
"_id": "77",
"_score": 1,
"_source": {}
},
{
"_index": "cs180esacs18_export_pdmarticle_1",
"_type": "item",
"_id": "67",
"_score": 1,
"_source": {}
},
{
"_index": "cs180esacs18_export_pdmarticle_1",
"_type": "item",
"_id": "76",
"_score": 1,
"_source": {}
}
]
}
}

So that's why i am using post_filter.

Have a look and suggest something.

Bye and thanks

On the face of it I’d say it looks like a bug that the same query returns 8, 2 or 9 results depending on how it’s wrapped.
Can you supply:

  1. the version of elasticsearch you’re using
  2. the JSON for the docs that match this query (just the 2 searched fields and ids will do)

Hi Mark,

we have above only 2 result sets, query and post_filter both returns 9 results and constant_score returns 2 results.
The reason for constant_score to return 2 results is constant_score don't support query and whatever we provide in filter clause that should be strictly matched, so doesn't matter you provided must or should clause.

Answers to your questions:

  1. Elastic version is 6.3
  2. So here are the some json documents with searched fileds

{
"_index": "cs180esacs18_export_pdmarticle_1",
"_type": "item",
"_id": "77",
"_version": 1,
"found": true,
"_source": {
"Label": "Shuffle Player 1 GB Magenta",
"ExternalKey": "CS-77"
}

{
"_index": "cs180esacs18_export_pdmarticle_1",
"_type": "item",
"_id": "10241",
"_version": 3,
"found": true,
"_source": {
"Label": "Shuffle Player 1 GB Magenta",
"ExternalKey": "CS-77"
}
}

{
"_index": "cs180esacs18_export_pdmarticle_1",
"_type": "item",
"_id": "76",
"_version": 1,
"found": true,
"_source": {
"Label": "Shuffle Player 1 GB Blue",
"ExternalKey": "CS-76"
}
}

Mapping for Label

{
"cs180esacs18_export_pdmarticle_1": {
"mappings": {
"item": {
"Label": {
"full_name": "Label",
"mapping": {
"Label": {
"type": "text",
"fields": {
"raw": {
"type": "keyword",
"ignore_above": 10922,
"normalizer": "ExportRawAnalyzer"
}
},
"analyzer": "ExportPrimaryAnalyzer"
}
}
}
}
}
}
}

Same mapping is for ExternalKey
and here are the analysers
"analysis": {
"normalizer": {
"ExportRawAnalyzer": {
"filter": "lowercase",
"type": "custom",
"char_filter": {
"char_filter": "html_strip"
}
}
},
"analyzer": {
"ExportPrimaryAnalyzer": {
"filter": "lowercase",
"char_filter": "html_strip",
"type": "custom",
"tokenizer": "whitespace"
}
}
}

Bye and thanks :slight_smile:

This looks to be a bug no longer present in 7.2.
I'm chasing down where the fix was (Lucene or elasticsearch) but the reproduction script is here

Details behind the fix are here

Hi Mark,

Thanks and got it.

But what do you think simply using post_filter instead of query to avoid scoring, would be sufficient ? will that cause any side effect ? Or any limitation it has ?

But anyways it doesn't improve much, slight time save. Is it because query is cached and post_filter not ?

Bye

To be honest I'm not sure how it might perform differently - as I mentioned before the use case is normally applying facet choices to a main query as opposed to your match_all. Caching may well be a concern - benchmarking will give you the truth.

You could use a regular query in a constant_score or bool filter if you scrubbed it of the should clauses (not sure why you'd want to send those if you're not interested in extra scoring points)

Hi Mark,

Didn't get this point, what benchmarking ?

Only in the latest ES version 7.2 not in old 6.x

Do you means should clause ? Want to boost performance, so one of the way i find out that ignore the scoring because i don't need scores, that's why need to reframe all previous query clauses with post_filter without touching inside logic of bool clauses.

Your own testing on your own hardware + docs + queries

My gist showed a 6.6 constant score wrapping a query - the problem is it effectively elevates the should clauses to musts which is not what 7.2 does. When you have a must clause then should clauses are only optional-extra clauses to improve ranking. In the context of a filter or constant-score-query they serve no purpose because ranking is constant.