Elasticsearch 6.0-beta1 - Boosting

Hi All !
I found some different behaviours between ES 5 and ES 6 on query Boosting but I did not find anything in the documentation about this change. Can someone help me ?

The RESULTS are the SAME but the _score values are differents, and the orders of the array result as changed due to different _score values. ANY ideas what happened ?

The problem is this one :

Creating an index and trying to create a Boosting Query :

curl -XPUT 'elasticsearch:9200/fedetest?pretty' -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "test": { 
      "properties": { 
        "name":  {
          "type":   "text",
          "index": "true"
        },
        "price":  {
          "type":   "float"
        }
      }
    }
  }
}
'


curl -XPOST 'elasticsearch:9200/_bulk?pretty' -H 'Content-Type: application/json' -d'
{"index":{"_index":"fedetest","_type":"test","_id":1}}
{"name" : "Vital Lama", "price": "5.2"}
{"index":{"_index":"fedetest","_type":"test","_id":2}}
{"name" : "Vital Match", "price": "2.1"}
{"index":{"_index":"fedetest","_type":"test","_id":3}}
{"name" : "Mercury Vital", "price": "7.5"}
{"index":{"_index":"fedetest","_type":"test","_id":4}}
{"name" : "Fist Mercury", "price": "3.8"}
{"index":{"_index":"fedetest","_type":"test","_id":5}}
{"name" : "Zama vital 2nd", "price": "1.2"}
'



curl -XGET 'elasticsearch:9200/fedetest/_search?pretty' -H 'Content-Type: application/json' -d'
{
    "query": {
        "boosting" : {
            "positive" : {
                "term" : {
                    "name" : "vital"
                }
            },
            "negative" : {
                 "term" : {
                     "name" : "mercury"
                }
            },
            "negative_boost" : 0.2
        }
    }
}
' 

We have 2 different results between ES 5.x and ES 6.0-beta1

ES5.x

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 4,
    "max_score" : 0.62191015,
    "hits" : [
      {
        "_index" : "fedetest",
        "_type" : "test",
        "_id" : "2",
        "_score" : 0.62191015,
        "_source" : {
          "name" : "Vital Match",
          "price" : "2.1"
        }
      },
      {
        "_index" : "fedetest",
        "_type" : "test",
        "_id" : "1",
        "_score" : 0.25811607,
        "_source" : {
          "name" : "Vital Lama",
          "price" : "5.2"
        }
      },
      {
        "_index" : "fedetest",
        "_type" : "test",
        "_id" : "5",
        "_score" : 0.25316024,
        "_source" : {
          "name" : "Lama Vital 2nd",
          "price" : "3.2"
        }
      },
      {
        "_index" : "fedetest",
        "_type" : "test",
        "_id" : "3",
        "_score" : 0.051623214,
        "_source" : {
          "name" : "Mercury Vital",
          "price" : "7.5"
        }
      }
    ]
  }
}

Elasticsearch 6.0-beta1

{
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 4,
    "max_score" : 0.6931472,
    "hits" : [
      {
        "_index" : "fedetest",
        "_type" : "test",
        "_id" : "2",
        "_score" : 0.6931472,
        "_source" : {
          "name" : "Vital Match",
          "price" : "2.1"
        }
      },
      {
        "_index" : "fedetest",
        "_type" : "test",
        "_id" : "5",
        "_score" : 0.2876821,
        "_source" : {
          "name" : "Lama Vital 2nd",
          "price" : "3.2"
        }
      },
      {
        "_index" : "fedetest",
        "_type" : "test",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "name" : "Vital Lama",
          "price" : "5.2"
        }
      },
      {
        "_index" : "fedetest",
        "_type" : "test",
        "_id" : "3",
        "_score" : 0.2876821,
        "_source" : {
          "name" : "Mercury Vital",
          "price" : "7.5"
        }
      }
    ]
  }
}

I'm not aware of anything off the top of my head (although it's possible something changed, I'll take a look), but this could just be the result of scoring oddities that crop up when you have small indices spread across shards.

Basically, term and document frequencies are calculated on a local, per-shard basis to keep inter-node coordination to a minimum. When you only have a few documents, their placement on the shards can make a big impact on scoring since it drastically changes what terms are available on each shard.

More details here: https://www.elastic.co/guide/en/elasticsearch/guide/current/relevance-is-broken.html

Could you try your test again with a single shard? Or use dfs_query_then_fetch to precompute the term/doc frequencies?

@polyfractal I've made a lot of tests because I had some weird behaviours. But now it looks like a bit more clear

I put 5 elements inside the index created; to simplify I used an _id with positive integer for every item so from 1 to 5.

I tested the Boosting Query with 3 releases of Elasticsearch in a cluster of 2 master

  • ES 6.0-beta1
  • ES 5.5.1
  • ES 5.4.1

I tested all with a direct curl (with and without DSF_), and also with the unit tests that fails (php Elastica).
These are the results

ES6-beta1
  CLUSTER WITH PHPUNIT TESTS (PHP ELASTICA LIB)
    1,2,3,5 --> no_dfs 5 shards not specified
    1,2,3,5 --> no dfs 1 shard forced at config
    1,2,3,5 --> no_dfs 5 shards forced at config

    1,2,3,5 --> yes_dfs 5 shards not specified
    1,2,3,5 --> yes_dfs 1 shard forced at config
    1,3,2,5 --> yes_dfs 5 shards forced at config

  CLUSTER WITH CURLS
    1,2,3,5 --> no_dfs 5 shards not specified
    1,2,3,5 --> no dfs 1 shard forced at config
    1,2,3,5 --> no_dfs 5 shards forced at config

    1,3,2,5 --> yes_dfs 5 shards not specified
    1,2,3,5 --> yes_dfs 1 shard forced at config
    1,3,2,5 --> yes_dfs 5 shards forced at config

ES 5.5.1
  CLUSTER WITH PHPUNIT TESTS (PHP ELASTICA LIB)
    **1,2,5,3** --> no_dfs 5 shards not specified
    **1,2,5,3** --> no dfs 1 shard forced at config
    **1,2,5,3** --> no_dfs 5 shards forced at config

    **1,2,5,3** --> yes_dfs 5 shards not specified
    **1,2,5,3** --> yes_dfs 1 shard forced at config
    **1,2,5,3** --> yes_dfs 5 shards forced at config

  NO CLUSTER WITH CURLS
    1,2,3,5 --> no_dfs 5 shards not specified
    1,2,3,5 --> no dfs 1 shard forced at config
    1,2,3,5 --> no_dfs 5 shards forced at config

    1,3,2,5 --> yes_dfs 5 shards not specified
    1,2,3,5 --> yes_dfs 1 shard forced at config
    1,3,2,5 --> yes_dfs 5 shards forced at config

ES 5.4.1
  CLUSTER WITH PHPUNIT TESTS (PHP ELASTICA LIB)
    **1,2,5,3** --> no_dfs 5 shards not specified
    1,2,3,5 --> no dfs 1 shard forced at config
    1,2,3,5 --> no_dfs 5 shards forced at config

    **1,2,5,3** --> yes_dfs 5 shards not specified
    **1,2,5,3** --> yes_dfs 1 shard forced at config
    **1,2,5,3** --> yes_dfs 5 shards forced at config

  NO CLUSTER WITH CURLS
    1,2,3,5 --> no_dfs 5 shards not specified
    1,2,3,5 --> no dfs 1 shard forced at config
    1,2,3,5 --> no_dfs 5 shards forced at config

    1,3,2,5 --> yes_dfs 5 shards not specified (default config)
    1,2,3,5 --> yes_dfs 1 shard forced at config
    1,3,2,5 --> yes_dfs 5 shards forced at config

As you can see I've rounded by ** the results that looks like incoherent.
It always happen when the curls are executed from unit tests inside Elastica (which is a php library for Elasticsearch). Using curl returns the same results across multiple clusters and ES versions.

Huh... that is odd.

So it looks like there were some changes in 6.0 that would have effects on scoring: https://www.elastic.co/guide/en/elasticsearch/reference/6.0/breaking_60_search_changes.html#_scoring_changes

The change to normalization could explain the slightly different scores between version (but roughly same sort order).

I'm not sure why the PHP vs curl difference is happening though.... that's very strange. Perhaps it's a floating point rounding error when PHP is parsing the json and converting into a double?

Is Elastica potentially adding a default parameter to the query that isn't being added in your curl test?

Hi @polyfractal thanks for ur reply.
It's really odd :slight_smile: I've tried to inspect differents calls made by Elastica but nothing :slight_smile:

I will make some more search an then I will report the results.

tnx

@polyfractal a quick question, testing Elastica I have a docker setup with a cluster of 2 masters against which we execute tests. As you suggested I would like to compare the query generated by Elastica with the curls I've created. I can enable logs:verbose with the Elasticsearch docker image ?

I've already tried these 2 settings but nothing .... any hint ?

curl -XPUT elasticsearch:9200/_cluster/_settings -H "Content-Type: application/json" -d'
{
    "persistent" : {
        "logger.discovery" : "DEBUG"
    },
    "transient" : {
        "logger.discovery" : "DEBUG"
    },
}
'

curl -XPUT elasticsearch:9200/elastica_testfede/_settings -H "Content-Type: application/json" -d'
{
    "index.search.slowlog.threshold.query.warn" : "0s", 
    "index.search.slowlog.threshold.fetch.debug": "0s", 
    "index.indexing.slowlog.threshold.index.info": "0s" 
}
' 

Tnx!

You're second curl is essentially it, although I'd tweak it to:

curl -XPUT elasticsearch:9200/test/_settings -H "Content-Type: application/json" -d'
{
    "index.search.slowlog.threshold.query.warn" : "0s", 
    "index.search.slowlog.threshold.fetch.warn": "0s", 
    "index.indexing.slowlog.threshold.index.warn": "0s" 
}
' 

I just made all the logging levels "warn" instead of info/debug, that way you don't need to change the default logger level. The first API call you have changes the logging of Discovery module, which is basically how nodes find each other. So that won't be useful to you here.

I just tested it locally and see entries like this in the slowlog:

[2017-08-22T10:13:01,662][WARN ][index.search.slowlog.query] [EW1U4rh] [test][4] took[73.6micros], took_millis[0], types[], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"query":{"match_all":{"boost":1.0}}}], 
[2017-08-22T10:13:01,664][WARN ][index.search.slowlog.fetch] [EW1U4rh] [test][2] took[178.8micros], took_millis[0], types[], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"query":{"match_all":{"boost":1.0}}}],  

Just a note; this could be really dangerous. We always need a quorum of masters for the cluster to be stable and avoid split brains. So you should always use an odd number of masters. The recommendations are either 1, 3 or 5 masters.

How many data nodes do you have in this test cluster? It might simplify things to move everything to a single node, with a single index that has a single shard. Just to help eliminate and distributed effects. :slight_smile:

Thanks @polyfractal I'll keep you up to date, I would like to understand if it's a problem of the library (PHP Elastica) or something else.

Tnx again.

@polyfractal just a note on splitbrain problem, we have a cluster of 2 master but we use it only for testing the environment and the API for _cluster. Do u think this could be dangerous ?

Hi @polyfractal I have the output log generated from Elastica :

[2017-08-23T08:00:03,961][WARN ][i.s.s.query              ] [fNe_n5a] [elastica_fedetest][0] took[301.6micros], took_millis[0], types[], stats[], search_type[QUERY_THEN_FETCH], total_shards[1], source[{"query":{"boosting":{"positive":{"term":{"name":{"value":"vital","boost":1.0}}},"negative":{"term":{"name":{"value":"mercury","boost":1.0}}},"negative_boost":0.2,"boost":1.0}}}],
[2017-08-23T08:00:03,963][WARN ][i.s.s.fetch              ] [fNe_n5a] [elastica_fedetest][0] took[4.5ms], took_millis[4], types[], stats[], search_type[QUERY_THEN_FETCH], total_shards[1], source[{"query":{"boosting":{"positive":{"term":{"name":{"value":"vital","boost":1.0}}},"negative":{"term":{"name":{"value":"mercury","boost":1.0}}},"negative_boost":0.2,"boost":1.0}}}],

if we have a look a the query this is the JSON :

{
	"query": {
		"boosting": {
			"positive": {
				"term": {
					"name": {
						"value": "vital",
						"boost": 1.0
					}
				}
			},
			"negative": {
				"term": {
					"name": {
						"value": "mercury",
						"boost": 1.0
					}
				}
			},
			"negative_boost": 0.2,
			"boost": 1.0
		}
	}
}

as you can see there's a *"boost": 1.0 in every term and in beside negative_boost ... I'm a bit stuck It looks like ES append it to every Boosting query.

if u look at the BoostingQueryTest.java

you'll see that the query is built exactly as the log output I have from ES.
Forcing this query as a curl I get an error ...

curl -XPUT 'elasticsearch:9200/fedetest?pretty' -H 'Content-Type: application/json' -d'
{
    "query": {
        "boosting": {
            "positive": {
                "term": {
                    "name": {
                        "value": "vital",
                        "boost": 1.0
                    }
                }
            },
            "negative": {
                "term": {
                    "name": {
                        "value": "mercury",
                        "boost": 1.0
                    }
                }
            },
            "negative_boost": 0.2
        }
    }
}
'

{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "unknown setting [index.query.boosting.negative.term.name.boost] please check that any required plugins are installed, or check the breaking changes documentation for removed settings"
      }
    ],
    "type" : "illegal_argument_exception",
    "reason" : "unknown setting [index.query.boosting.negative.term.name.boost] please check that any required plugins are installed, or check the breaking changes documentation for removed settings"
  },
  "status" : 400
}

@polyfractal do u have any Idea ? I'm a bit lost :slight_smile:

just a note on splitbrain problem, we have a cluster of 2 master but we use it only for testing the environment and the API for _cluster. Do u think this could be dangerous ?

Probably not, but could cause some sporadic failures or strange behavior if the nodes happen to split during your test. In that case, you'll get different results depending on which side of the split you talk to. I'm assuming these are running locally though, right? The chance of a split happening locally are essentially zero, unless docker's networking gets really flaky or something.

I just wanted to call out the two masters since it's a big antipattern, although probably fine in this case :slight_smile:

Forcing this query as a curl I get an error …

Looks like you forgot to add the /_search part to the URL :slight_smile:

@polyfractal thank you my fault the correct curl is

curl -XGET 'elasticsearch:9200/fedetest/_search?pretty&search_type=dfs_query_then_fetch' -H 'Content-Type: application/json' -d'
{
    "query": {
        "boosting": {
            "positive": {
                "term": {
                    "name": {
                        "value": "vital",
                        "boost": 1.0
                    }
                }
            },
            "negative": {
                "term": {
                    "name": {
                        "value": "mercury",
                        "boost": 1.0
                    }
                }
            },
            "negative_boost": 0.2
        }
    }
}
'

NOW query and lib execute the same query but ES 5 returns a different result than ES 6 (even with query_then_fetch and dfs_query_then_fetch)

ES 6.0beta1 -> 1,2,3,5
ES 5.4.1 -> 1,2,5,3
ES 5.5.1 -> 1,2,5,3

It's a bit weird... any idea ?

I've found this one in Lucene doc could maybe this one be related with this beahaviour ?

I've also had a look at the tests you execute in ES source code you are only testing that the query is well constructed but not results generated by BoostingQueryBuilder

Ok pheww, glad the library and curl are agreeing now. That had me worried :slight_smile:

I think the discrepancy between versions you're seeing is indeed Lucene 7347 (e.g.: https://www.elastic.co/guide/en/elasticsearch/reference/6.0/breaking_60_search_changes.html#_scoring_changes) which would subtly affect the scoring. I assumed it wouldn't affect sorting much, but perhaps the changes are sufficient to alter sorting a bit if the scores are similar. Not entirely sure tbh, a bit outside my area of expertise.

Re: the test, you're right, we're only checking well-formedness there. In general we rely on Lucene's test framework to work out issues that directly relate to lucene queries (e.g. we pass the boosting query directly to Lucene without any modification, so we rely on their tests).

Lucene occasionally adds things that change scoring, like 7347 or the earlier changes to TF/IDF vs BM25. As a downstream user of Lucene we just accept these changes and don't try to keep scoring backwards compatible, since that would be a losing proposition :frowning:

wow it's been hard find this one :slight_smile:

very interesting I will try to inspect that thing in order to understand if
sorting is affected by that change...
Anyway back to the Elastica lib my suggestion would be to follow Elasticsearch
and testing boosting or negative_boosting the same way ES do.
Only the checking the query and not the results or as in that case if a nth result
is in position 3 or in position 4. this could lead (due to the constant development of Lucene)
to some flaky tests.

@polyfractal thank you very much!
You helped me a lot!

Happy to help, sorry it took so long to work out :slight_smile:

++ to not checking default scoring order. There are places in ES (and lang clients) where we validate the response to various degrees, but it usually involves something deterministic to make it work. E.g. you can check user-defined sorting on fields, or the values of aggregation results, etc. But things like implicit scoring can be tricky indeed, since it's liable to change slightly :slight_smile:

Goodluck!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.