Wrong boosting in ES?


(Bernd Fehling-2) #1

While trying to get familiar with ES I wrote some plugins.
But the tests are showing differences compared to Solr.
But where is the bug, in ES or in Solr?

In ES my query is:
{
"fields": [
"score",
"dctitle",
"dcoa"
],
"explain": true,
"query": {
"bool": {
"must": {
"match": {
"text": {
"query": "einzelhandel",
"boost": 200
}
}
},
"should": {
"match": {
"oa": {
"query": "1",
"boost": 400
}
}
}
}
}
}

And the explain result is:

  • _explanation: {
    • value: 1.4639825
    • description: sum of:
    • details: [
      • {
        • value: 1.2748994
        • description: weight(text:einzelhandel^200.0 in 188102) [
          PerFieldSimilarity], result of:
        • details: [
          • {
            • value: 1.2748994
            • description: score(doc=188102,freq=1.0 = termFreq=1.0 ),
              product of:
            • details: [
              • {
                • value: 0.98196113
                • description: queryWeight, product of:
                • details: [
                  • {
                    • value: 200
                    • description: boost
                      }
                  • {
                    • value: 10.386557
                    • description: idf(docFreq=50, maxDocs=608275)
                      }
                  • {
                    • value: 0.00047270773
                    • description: queryNorm
                      }
                      ]
                      }
              • {
                • value: 1.2983196
                • description: fieldWeight in 188102, product of:
                • details: [
                  • {
                    • value: 1
                    • description: tf(freq=1.0), with freq of:
                    • details: [
                      • {
                        • value: 1
                        • description: termFreq=1.0
                          }
                          ]
                          }
                  • {
                    • value: 10.386557
                    • description: idf(docFreq=50, maxDocs=608275)
                      }
                  • {
                    • value: 0.125
                    • description: fieldNorm(doc=188102)
                      }
                      ]
                      }
                      ]
                      }
                      ]
                      }
      • {
        • value: 0.1890831
        • description: ConstantScore(oa:[1 TO 1]^400.0)^400.0, product
          of:
        • details: [
          • {
            • value: 400
            • description: boost
              }
          • {
            • value: 0.00047270773
            • description: queryNorm
              }
              ]
              }
              ]
              }

}

The top part looks good to me, but why is ES building from
"should(oa:1^400)" something
like "ConstantScore(oa:[1 TO 1]^400.0)^400.0" which looks like a range
query with double boost of 400?

Is that right?

Regards
Bernd

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/45f94ead-b37c-4c13-9bd2-8833cd8387da%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #2

Maybe a typo?

For this example

PUT /test/test/1
{
"name" : "einzelhandel",
"type" : "oa"
}
PUT /test/test/2
{
"name" : "grosshandel",
"type" : "oa"
}
PUT /test/test/3
{
"name" : "grosshandel",
"type" : "closed"
}

POST /test/test/_search
{
"explain": true,
"query": {
"bool": {
"must": {
"match": {
"name": {
"query": "einzelhandel",
"boost": 200
}
}
},
"should": {
"match": {
"type": {
"query" : "oa",
"boost": 400
}
}
}
}
}
}

I get a reasonable explain:

{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.41168627,
"hits": [
{
"_shard": 2,
"_node": "RuosUkeaQKqKYq_WWi_brA",
"_index": "test",
"_type": "test",
"_id": "1",
"_score": 0.41168627,
"_source": {
"name": "einzelhandel",
"type": "oa"
},
"_explanation": {
"value": 0.41168627,
"description": "sum of:",
"details": [
{
"value": 0.13722876,
"description": "weight(name:einzelhandel^200.0 in 0)
[PerFieldSimilarity], result of:",
"details": [
{
"value": 0.13722876,
"description": "score(doc=0,freq=1.0 =
termFreq=1.0\n), product of:",
"details": [
{
"value": 0.4472136,
"description": "queryWeight, product of:",
"details": [
{
"value": 200,
"description": "boost"
},
{
"value": 0.30685282,
"description": "idf(docFreq=1,
maxDocs=1)"
},
{
"value": 0.0072871028,
"description": "queryNorm"
}
]
},
{
"value": 0.30685282,
"description": "fieldWeight in 0, product
of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with
freq of:",
"details": [
{
"value": 1,
"description": "termFreq=1.0"
}
]
},
{
"value": 0.30685282,
"description": "idf(docFreq=1,
maxDocs=1)"
},
{
"value": 1,
"description": "fieldNorm(doc=0)"
}
]
}
]
}
]
},
{
"value": 0.2744575,
"description": "weight(type:oa^400.0 in 0)
[PerFieldSimilarity], result of:",
"details": [
{
"value": 0.2744575,
"description": "score(doc=0,freq=1.0 =
termFreq=1.0\n), product of:",
"details": [
{
"value": 0.8944272,
"description": "queryWeight, product of:",
"details": [
{
"value": 400,
"description": "boost"
},
{
"value": 0.30685282,
"description": "idf(docFreq=1,
maxDocs=1)"
},
{
"value": 0.0072871028,
"description": "queryNorm"
}
]
},
{
"value": 0.30685282,
"description": "fieldWeight in 0, product
of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with
freq of:",
"details": [
{
"value": 1,
"description": "termFreq=1.0"
}
]
},
{
"value": 0.30685282,
"description": "idf(docFreq=1,
maxDocs=1)"
},
{
"value": 1,
"description": "fieldNorm(doc=0)"
}
]
}
]
}
]
}
]
}
}
]
}
}

Jörg

On Mon, Aug 4, 2014 at 1:28 PM, Bernd Fehling bernd.fehling@gmail.com
wrote:

While trying to get familiar with ES I wrote some plugins.
But the tests are showing differences compared to Solr.
But where is the bug, in ES or in Solr?

In ES my query is:
{
"fields": [
"score",
"dctitle",
"dcoa"
],
"explain": true,
"query": {
"bool": {
"must": {
"match": {
"text": {
"query": "einzelhandel",
"boost": 200
}
}
},
"should": {
"match": {
"oa": {
"query": "1",
"boost": 400
}
}
}
}
}
}

And the explain result is:

  • _explanation: {
    • value: 1.4639825
    • description: sum of:
    • details: [
      • {
        • value: 1.2748994
        • description: weight(text:einzelhandel^200.0 in 188102) [
          PerFieldSimilarity], result of:
        • details: [
          • {
            • value: 1.2748994
            • description: score(doc=188102,freq=1.0 = termFreq=1.0
              ), product of:
            • details: [
              • {
                • value: 0.98196113
                • description: queryWeight, product of:
                • details: [
                  • {
                    • value: 200
                    • description: boost
                      }
                  • {
                    • value: 10.386557
                    • description: idf(docFreq=50, maxDocs=
                  }
                  • {
                    • value: 0.00047270773
                    • description: queryNorm
                      }
                      ]
                      }
              • {
                • value: 1.2983196
                • description: fieldWeight in 188102, product of:
                • details: [
                  • {
                    • value: 1
                    • description: tf(freq=1.0), with freq of:
                    • details: [
                      • {
                        • value: 1
                        • description: termFreq=1.0
                          }
                          ]
                          }
                  • {
                    • value: 10.386557
                    • description: idf(docFreq=50, maxDocs=
                  }
                  • {
                    • value: 0.125
                    • description: fieldNorm(doc=188102)
                      }
                      ]
                      }
                      ]
                      }
                      ]
                      }
      • {
        • value: 0.1890831
        • description: ConstantScore(oa:[1 TO 1]^400.0)^400.0,
          product of:
        • details: [
          • {
            • value: 400
            • description: boost
              }
          • {
            • value: 0.00047270773
            • description: queryNorm
              }
              ]
              }
              ]
              }

}

The top part looks good to me, but why is ES building from
"should(oa:1^400)" something
like "ConstantScore(oa:[1 TO 1]^400.0)^400.0" which looks like a range
query with double boost of 400?

Is that right?

Regards
Bernd

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/45f94ead-b37c-4c13-9bd2-8833cd8387da%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/45f94ead-b37c-4c13-9bd2-8833cd8387da%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHw6fqKGvk-mAf%2B9wz0TgtE7YsW_nwWShPin3dzPZ8bFA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Bernd Fehling-2) #3

Actually I can't follow you.

My query should be:
"must_match(fieldname "text", value "einzelhandel", boost "200") AND
should_match(fieldname "oa", value "1", boost "400")"

Where is my typo?

Ahh you mean the value of "oa" with "1" is a string whereas the mapping is
integer?

Am Montag, 4. August 2014 13:44:22 UTC+2 schrieb Jörg Prante:

Maybe a typo?

For this example

PUT /test/test/1
{
"name" : "einzelhandel",
"type" : "oa"
}
PUT /test/test/2
{
"name" : "grosshandel",
"type" : "oa"
}
PUT /test/test/3
{
"name" : "grosshandel",
"type" : "closed"
}

POST /test/test/_search
{
"explain": true,
"query": {
"bool": {
"must": {
"match": {
"name": {
"query": "einzelhandel",
"boost": 200
}
}
},
"should": {
"match": {
"type": {
"query" : "oa",
"boost": 400
}
}
}
}
}
}

I get a reasonable explain:

{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.41168627,
"hits": [
{
"_shard": 2,
"_node": "RuosUkeaQKqKYq_WWi_brA",
"_index": "test",
"_type": "test",
"_id": "1",
"_score": 0.41168627,
"_source": {
"name": "einzelhandel",
"type": "oa"
},
"_explanation": {
"value": 0.41168627,
"description": "sum of:",
"details": [
{
"value": 0.13722876,
"description": "weight(name:einzelhandel^200.0 in 0)
[PerFieldSimilarity], result of:",
"details": [
{
"value": 0.13722876,
"description": "score(doc=0,freq=1.0 =
termFreq=1.0\n), product of:",
"details": [
{
"value": 0.4472136,
"description": "queryWeight, product of:",
"details": [
{
"value": 200,
"description": "boost"
},
{
"value": 0.30685282,
"description": "idf(docFreq=1,
maxDocs=1)"
},
{
"value": 0.0072871028,
"description": "queryNorm"
}
]
},
{
"value": 0.30685282,
"description": "fieldWeight in 0, product
of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with
freq of:",
"details": [
{
"value": 1,
"description": "termFreq=1.0"
}
]
},
{
"value": 0.30685282,
"description": "idf(docFreq=1,
maxDocs=1)"
},
{
"value": 1,
"description": "fieldNorm(doc=0)"
}
]
}
]
}
]
},
{
"value": 0.2744575,
"description": "weight(type:oa^400.0 in 0)
[PerFieldSimilarity], result of:",
"details": [
{
"value": 0.2744575,
"description": "score(doc=0,freq=1.0 =
termFreq=1.0\n), product of:",
"details": [
{
"value": 0.8944272,
"description": "queryWeight, product of:",
"details": [
{
"value": 400,
"description": "boost"
},
{
"value": 0.30685282,
"description": "idf(docFreq=1,
maxDocs=1)"
},
{
"value": 0.0072871028,
"description": "queryNorm"
}
]
},
{
"value": 0.30685282,
"description": "fieldWeight in 0, product
of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with
freq of:",
"details": [
{
"value": 1,
"description": "termFreq=1.0"
}
]
},
{
"value": 0.30685282,
"description": "idf(docFreq=1,
maxDocs=1)"
},
{
"value": 1,
"description": "fieldNorm(doc=0)"
}
]
}
]
}
]
}
]
}
}
]
}
}

Jörg

On Mon, Aug 4, 2014 at 1:28 PM, Bernd Fehling <bernd....@gmail.com
<javascript:>> wrote:

While trying to get familiar with ES I wrote some plugins.
But the tests are showing differences compared to Solr.
But where is the bug, in ES or in Solr?

In ES my query is:
{
"fields": [
"score",
"dctitle",
"dcoa"
],
"explain": true,
"query": {
"bool": {
"must": {
"match": {
"text": {
"query": "einzelhandel",
"boost": 200
}
}
},
"should": {
"match": {
"oa": {
"query": "1",
"boost": 400
}
}
}
}
}
}

And the explain result is:

  • _explanation: {
    • value: 1.4639825
    • description: sum of:
    • details: [
      • {
        • value: 1.2748994
        • description: weight(text:einzelhandel^200.0 in 188102) [
          PerFieldSimilarity], result of:
        • details: [
          • {
            • value: 1.2748994
            • description: score(doc=188102,freq=1.0 = termFreq=1.0
              ), product of:
            • details: [
              • {
                • value: 0.98196113
                • description: queryWeight, product of:
                • details: [
                  • {
                    • value: 200
                    • description: boost
                      }
                  • {
                    • value: 10.386557
                    • description: idf(docFreq=50, maxDocs=
                  }
                  • {
                    • value: 0.00047270773
                    • description: queryNorm
                      }
                      ]
                      }
              • {
                • value: 1.2983196
                • description: fieldWeight in 188102, product of:
                • details: [
                  • {
                    • value: 1
                    • description: tf(freq=1.0), with freq of:
                    • details: [
                      • {
                        • value: 1
                        • description: termFreq=1.0
                          }
                          ]
                          }
                  • {
                    • value: 10.386557
                    • description: idf(docFreq=50, maxDocs=
                  }
                  • {
                    • value: 0.125
                    • description: fieldNorm(doc=188102)
                      }
                      ]
                      }
                      ]
                      }
                      ]
                      }
      • {
        • value: 0.1890831
        • description: ConstantScore(oa:[1 TO 1]^400.0)^400.0,
          product of:
        • details: [
          • {
            • value: 400
            • description: boost
              }
          • {
            • value: 0.00047270773
            • description: queryNorm
              }
              ]
              }
              ]
              }

}

The top part looks good to me, but why is ES building from
"should(oa:1^400)" something
like "ConstantScore(oa:[1 TO 1]^400.0)^400.0" which looks like a range
query with double boost of 400?

Is that right?

Regards
Bernd

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/45f94ead-b37c-4c13-9bd2-8833cd8387da%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/45f94ead-b37c-4c13-9bd2-8833cd8387da%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6c9242d7-e865-420d-b520-cad2dd4597fd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #4

I think it makes more sense to use string fields for term boosting. The
"explain" display for a boosted numeric looks weird indeed.

For numeric boosting, maybe function score query works better and is a more
convenient approach.

Jörg

On Mon, Aug 4, 2014 at 2:37 PM, Bernd Fehling bernd.fehling@gmail.com
wrote:

Actually I can't follow you.

My query should be:
"must_match(fieldname "text", value "einzelhandel", boost "200") AND
should_match(fieldname "oa", value "1", boost "400")"

Where is my typo?

Ahh you mean the value of "oa" with "1" is a string whereas the mapping is
integer?

Am Montag, 4. August 2014 13:44:22 UTC+2 schrieb Jörg Prante:

Maybe a typo?

For this example

PUT /test/test/1
{
"name" : "einzelhandel",
"type" : "oa"
}
PUT /test/test/2
{
"name" : "grosshandel",
"type" : "oa"
}
PUT /test/test/3
{
"name" : "grosshandel",
"type" : "closed"
}

POST /test/test/_search
{
"explain": true,
"query": {
"bool": {
"must": {
"match": {
"name": {
"query": "einzelhandel",
"boost": 200
}
}
},
"should": {
"match": {
"type": {
"query" : "oa",
"boost": 400
}
}
}
}
}
}

I get a reasonable explain:

{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.41168627,
"hits": [
{
"_shard": 2,
"_node": "RuosUkeaQKqKYq_WWi_brA",
"_index": "test",
"_type": "test",
"_id": "1",
"_score": 0.41168627,
"_source": {
"name": "einzelhandel",
"type": "oa"
},
"_explanation": {
"value": 0.41168627,
"description": "sum of:",
"details": [
{
"value": 0.13722876,
"description": "weight(name:einzelhandel^200.0 in
0) [PerFieldSimilarity], result of:",
"details": [
{
"value": 0.13722876,
"description": "score(doc=0,freq=1.0 =
termFreq=1.0\n), product of:",
"details": [
{
"value": 0.4472136,
"description": "queryWeight, product
of:",
"details": [
{
"value": 200,
"description": "boost"
},
{
"value": 0.30685282,
"description": "idf(docFreq=1,
maxDocs=1)"
},
{
"value": 0.0072871028,
"description": "queryNorm"
}
]
},
{
"value": 0.30685282,
"description": "fieldWeight in 0,
product of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with
freq of:",
"details": [
{
"value": 1,
"description": "termFreq=1.0"
}
]
},
{
"value": 0.30685282,
"description": "idf(docFreq=1,
maxDocs=1)"
},
{
"value": 1,
"description": "fieldNorm(doc=0)"
}
]
}
]
}
]
},
{
"value": 0.2744575,
"description": "weight(type:oa^400.0 in 0)
[PerFieldSimilarity], result of:",
"details": [
{
"value": 0.2744575,
"description": "score(doc=0,freq=1.0 =
termFreq=1.0\n), product of:",
"details": [
{
"value": 0.8944272,
"description": "queryWeight, product
of:",
"details": [
{
"value": 400,
"description": "boost"
},
{
"value": 0.30685282,
"description": "idf(docFreq=1,
maxDocs=1)"
},
{
"value": 0.0072871028,
"description": "queryNorm"
}
]
},
{
"value": 0.30685282,
"description": "fieldWeight in 0,
product of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with
freq of:",
"details": [
{
"value": 1,
"description": "termFreq=1.0"
}
]
},
{
"value": 0.30685282,
"description": "idf(docFreq=1,
maxDocs=1)"
},
{
"value": 1,
"description": "fieldNorm(doc=0)"
}
]
}
]
}
]
}
]
}
}
]
}
}

Jörg

On Mon, Aug 4, 2014 at 1:28 PM, Bernd Fehling bernd....@gmail.com
wrote:

While trying to get familiar with ES I wrote some plugins.
But the tests are showing differences compared to Solr.
But where is the bug, in ES or in Solr?

In ES my query is:
{
"fields": [
"score",
"dctitle",
"dcoa"
],
"explain": true,
"query": {
"bool": {
"must": {
"match": {
"text": {
"query": "einzelhandel",
"boost": 200
}
}
},
"should": {
"match": {
"oa": {
"query": "1",
"boost": 400
}
}
}
}
}
}

And the explain result is:

  • _explanation: {
    • value: 1.4639825
    • description: sum of:
    • details: [
      • {
        • value: 1.2748994
        • description: weight(text:einzelhandel^200.0 in 188102) [
          PerFieldSimilarity], result of:
        • details: [
          • {
            • value: 1.2748994
            • description: score(doc=188102,freq=1.0 = termFreq=
              1.0 ), product of:
            • details: [
              • {
                • value: 0.98196113
                • description: queryWeight, product of:
                • details: [
                  • {
                    • value: 200
                    • description: boost
                      }
                  • {
                    • value: 10.386557
                    • description: idf(docFreq=50, maxDocs=
                  }
                  • {
                    • value: 0.00047270773
                    • description: queryNorm
                      }
                      ]
                      }
              • {
                • value: 1.2983196
                • description: fieldWeight in 188102, product of
                  :
                • details: [
                  • {
                    • value: 1
                    • description: tf(freq=1.0), with freq of:
                    • details: [
                      • {
                        • value: 1
                        • description: termFreq=1.0
                          }
                          ]
                          }
                  • {
                    • value: 10.386557
                    • description: idf(docFreq=50, maxDocs=
                  }
                  • {
                    • value: 0.125
                    • description: fieldNorm(doc=188102)
                      }
                      ]
                      }
                      ]
                      }
                      ]
                      }
      • {
        • value: 0.1890831
        • description: ConstantScore(oa:[1 TO 1]^400.0)^400.0,
          product of:
        • details: [
          • {
            • value: 400
            • description: boost
              }
          • {
            • value: 0.00047270773
            • description: queryNorm
              }
              ]
              }
              ]
              }

}

The top part looks good to me, but why is ES building from
"should(oa:1^400)" something
like "ConstantScore(oa:[1 TO 1]^400.0)^400.0" which looks like a range
query with double boost of 400?

Is that right?

Regards

Bernd

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/45f94ead-b37c-4c13-9bd2-8833cd8387da%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/45f94ead-b37c-4c13-9bd2-8833cd8387da%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6c9242d7-e865-420d-b520-cad2dd4597fd%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/6c9242d7-e865-420d-b520-cad2dd4597fd%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHknWKXyU3SssommN0sJ5j16V6shzW9-j--_5RHKVdSvQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Bernd Fehling-2) #5

So because of problems with boosted numeric field values in Elasticsearch I
should change my mapping
and change all my 60 million documents from integer (or short) to string?
Wouldn't it be better to fix this in Elasticsearch?

At least it works with Solr, so Lucene is not the problem.

I will have a look into function score query as you suggest, thanks.

Am Montag, 4. August 2014 15:35:58 UTC+2 schrieb Jörg Prante:

I think it makes more sense to use string fields for term boosting. The
"explain" display for a boosted numeric looks weird indeed.

For numeric boosting, maybe function score query works better and is a
more convenient approach.

Jörg

On Mon, Aug 4, 2014 at 2:37 PM, Bernd Fehling <bernd....@gmail.com
<javascript:>> wrote:

Actually I can't follow you.

My query should be:
"must_match(fieldname "text", value "einzelhandel", boost "200") AND
should_match(fieldname "oa", value "1", boost "400")"

Where is my typo?

Ahh you mean the value of "oa" with "1" is a string whereas the mapping
is integer?

Am Montag, 4. August 2014 13:44:22 UTC+2 schrieb Jörg Prante:

Maybe a typo?

For this example

PUT /test/test/1
{
"name" : "einzelhandel",
"type" : "oa"
}
PUT /test/test/2
{
"name" : "grosshandel",
"type" : "oa"
}
PUT /test/test/3
{
"name" : "grosshandel",
"type" : "closed"
}

POST /test/test/_search
{
"explain": true,
"query": {
"bool": {
"must": {
"match": {
"name": {
"query": "einzelhandel",
"boost": 200
}
}
},
"should": {
"match": {
"type": {
"query" : "oa",
"boost": 400
}
}
}
}
}
}

I get a reasonable explain:

{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.41168627,
"hits": [
{
"_shard": 2,
"_node": "RuosUkeaQKqKYq_WWi_brA",
"_index": "test",
"_type": "test",
"_id": "1",
"_score": 0.41168627,
"_source": {
"name": "einzelhandel",
"type": "oa"
},
"_explanation": {
"value": 0.41168627,
"description": "sum of:",
"details": [
{
"value": 0.13722876,
"description": "weight(name:einzelhandel^200.0 in
0) [PerFieldSimilarity], result of:",
"details": [
{
"value": 0.13722876,
"description": "score(doc=0,freq=1.0 =
termFreq=1.0\n), product of:",
"details": [
{
"value": 0.4472136,
"description": "queryWeight, product
of:",
"details": [
{
"value": 200,
"description": "boost"
},
{
"value": 0.30685282,
"description": "idf(docFreq=1,
maxDocs=1)"
},
{
"value": 0.0072871028,
"description": "queryNorm"
}
]
},
{
"value": 0.30685282,
"description": "fieldWeight in 0,
product of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0),
with freq of:",
"details": [
{
"value": 1,
"description":
"termFreq=1.0"
}
]
},
{
"value": 0.30685282,
"description": "idf(docFreq=1,
maxDocs=1)"
},
{
"value": 1,
"description": "fieldNorm(doc=0)"
}
]
}
]
}
]
},
{
"value": 0.2744575,
"description": "weight(type:oa^400.0 in 0)
[PerFieldSimilarity], result of:",
"details": [
{
"value": 0.2744575,
"description": "score(doc=0,freq=1.0 =
termFreq=1.0\n), product of:",
"details": [
{
"value": 0.8944272,
"description": "queryWeight, product
of:",
"details": [
{
"value": 400,
"description": "boost"
},
{
"value": 0.30685282,
"description": "idf(docFreq=1,
maxDocs=1)"
},
{
"value": 0.0072871028,
"description": "queryNorm"
}
]
},
{
"value": 0.30685282,
"description": "fieldWeight in 0,
product of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0),
with freq of:",
"details": [
{
"value": 1,
"description":
"termFreq=1.0"
}
]
},
{
"value": 0.30685282,
"description": "idf(docFreq=1,
maxDocs=1)"
},
{
"value": 1,
"description": "fieldNorm(doc=0)"
}
]
}
]
}
]
}
]
}
}
]
}
}

Jörg

On Mon, Aug 4, 2014 at 1:28 PM, Bernd Fehling bernd....@gmail.com
wrote:

While trying to get familiar with ES I wrote some plugins.
But the tests are showing differences compared to Solr.
But where is the bug, in ES or in Solr?

In ES my query is:
{
"fields": [
"score",
"dctitle",
"dcoa"
],
"explain": true,
"query": {
"bool": {
"must": {
"match": {
"text": {
"query": "einzelhandel",
"boost": 200
}
}
},
"should": {
"match": {
"oa": {
"query": "1",
"boost": 400
}
}
}
}
}
}

And the explain result is:

  • _explanation: {
    • value: 1.4639825
    • description: sum of:
    • details: [
      • {
        • value: 1.2748994
        • description: weight(text:einzelhandel^200.0 in 188102) [
          PerFieldSimilarity], result of:
        • details: [
          • {
            • value: 1.2748994
            • description: score(doc=188102,freq=1.0 = termFreq=
              1.0 ), product of:
            • details: [
              • {
                • value: 0.98196113
                • description: queryWeight, product of:
                • details: [
                  • {
                    • value: 200
                    • description: boost
                      }
                  • {
                    • value: 10.386557
                    • description: idf(docFreq=50, maxDocs=
                  }
                  • {
                    • value: 0.00047270773
                    • description: queryNorm
                      }
                      ]
                      }
              • {
                • value: 1.2983196
                • description: fieldWeight in 188102, product
                  of:
                • details: [
                  • {
                    • value: 1
                    • description: tf(freq=1.0), with freq of
                      :
                    • details: [
                      • {
                        • value: 1
                        • description: termFreq=1.0
                          }
                          ]
                          }
                  • {
                    • value: 10.386557
                    • description: idf(docFreq=50, maxDocs=
                  }
                  • {
                    • value: 0.125
                    • description: fieldNorm(doc=188102)
                      }
                      ]
                      }
                      ]
                      }
                      ]
                      }
      • {
        • value: 0.1890831
        • description: ConstantScore(oa:[1 TO 1]^400.0)^400.0,
          product of:
        • details: [
          • {
            • value: 400
            • description: boost
              }
          • {
            • value: 0.00047270773
            • description: queryNorm
              }
              ]
              }
              ]
              }

}

The top part looks good to me, but why is ES building from
"should(oa:1^400)" something
like "ConstantScore(oa:[1 TO 1]^400.0)^400.0" which looks like a range
query with double boost of 400?

Is that right?

Regards

Bernd

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/45f94ead-b37c-4c13-9bd2-8833cd8387da%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/45f94ead-b37c-4c13-9bd2-8833cd8387da%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6c9242d7-e865-420d-b520-cad2dd4597fd%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/6c9242d7-e865-420d-b520-cad2dd4597fd%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2cfe1ddc-f15e-46b2-9b65-759e5f3bdcce%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #6

Because integer fields have no norms, it is quite uncommon to use them for
boosting. More common is the use for interpreting integer values as input
for scoring algorithm with function score.

Which Solr version is this? Solr did not follow the Lucene default in
previous versions regarding integer boosting
https://issues.apache.org/jira/browse/SOLR-3140

Jörg

On Mon, Aug 4, 2014 at 4:33 PM, Bernd Fehling bernd.fehling@gmail.com
wrote:

So because of problems with boosted numeric field values in Elasticsearch
I should change my mapping
and change all my 60 million documents from integer (or short) to string?
Wouldn't it be better to fix this in Elasticsearch?

At least it works with Solr, so Lucene is not the problem.

I will have a look into function score query as you suggest, thanks.

Am Montag, 4. August 2014 15:35:58 UTC+2 schrieb Jörg Prante:

I think it makes more sense to use string fields for term boosting. The
"explain" display for a boosted numeric looks weird indeed.

For numeric boosting, maybe function score query works better and is a
more convenient approach.

Jörg

On Mon, Aug 4, 2014 at 2:37 PM, Bernd Fehling bernd....@gmail.com
wrote:

Actually I can't follow you.

My query should be:
"must_match(fieldname "text", value "einzelhandel", boost "200") AND
should_match(fieldname "oa", value "1", boost "400")"

Where is my typo?

Ahh you mean the value of "oa" with "1" is a string whereas the mapping
is integer?

Am Montag, 4. August 2014 13:44:22 UTC+2 schrieb Jörg Prante:

Maybe a typo?

For this example

PUT /test/test/1
{
"name" : "einzelhandel",
"type" : "oa"
}
PUT /test/test/2
{
"name" : "grosshandel",
"type" : "oa"
}
PUT /test/test/3
{
"name" : "grosshandel",
"type" : "closed"
}

POST /test/test/_search
{
"explain": true,
"query": {
"bool": {
"must": {
"match": {
"name": {
"query": "einzelhandel",
"boost": 200
}
}
},
"should": {
"match": {
"type": {
"query" : "oa",
"boost": 400
}
}
}
}
}
}

I get a reasonable explain:

{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.41168627,
"hits": [
{
"_shard": 2,
"_node": "RuosUkeaQKqKYq_WWi_brA",
"_index": "test",
"_type": "test",
"_id": "1",
"_score": 0.41168627,
"_source": {
"name": "einzelhandel",
"type": "oa"
},
"_explanation": {
"value": 0.41168627,
"description": "sum of:",
"details": [
{
"value": 0.13722876,
"description": "weight(name:einzelhandel^200.0 in
0) [PerFieldSimilarity], result of:",
"details": [
{
"value": 0.13722876,
"description": "score(doc=0,freq=1.0 =
termFreq=1.0\n), product of:",
"details": [
{
"value": 0.4472136,
"description": "queryWeight, product
of:",
"details": [
{
"value": 200,
"description": "boost"
},
{
"value": 0.30685282,
"description": "idf(docFreq=1,
maxDocs=1)"
},
{
"value": 0.0072871028,
"description": "queryNorm"
}
]
},
{
"value": 0.30685282,
"description": "fieldWeight in 0,
product of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0),
with freq of:",
"details": [
{
"value": 1,
"description":
"termFreq=1.0"
}
]
},
{
"value": 0.30685282,
"description": "idf(docFreq=1,
maxDocs=1)"
},
{
"value": 1,
"description": "fieldNorm(doc=0)"
}
]
}
]
}
]
},
{
"value": 0.2744575,
"description": "weight(type:oa^400.0 in 0)
[PerFieldSimilarity], result of:",
"details": [
{
"value": 0.2744575,
"description": "score(doc=0,freq=1.0 =
termFreq=1.0\n), product of:",
"details": [
{
"value": 0.8944272,
"description": "queryWeight, product
of:",
"details": [
{
"value": 400,
"description": "boost"
},
{
"value": 0.30685282,
"description": "idf(docFreq=1,
maxDocs=1)"
},
{
"value": 0.0072871028,
"description": "queryNorm"
}
]
},
{
"value": 0.30685282,
"description": "fieldWeight in 0,
product of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0),
with freq of:",
"details": [
{
"value": 1,
"description":
"termFreq=1.0"
}
]
},
{
"value": 0.30685282,
"description": "idf(docFreq=1,
maxDocs=1)"
},
{
"value": 1,
"description": "fieldNorm(doc=0)"
}
]
}
]
}
]
}
]
}
}
]
}
}

Jörg

On Mon, Aug 4, 2014 at 1:28 PM, Bernd Fehling bernd....@gmail.com
wrote:

While trying to get familiar with ES I wrote some plugins.
But the tests are showing differences compared to Solr.
But where is the bug, in ES or in Solr?

In ES my query is:
{
"fields": [
"score",
"dctitle",
"dcoa"
],
"explain": true,
"query": {
"bool": {
"must": {
"match": {
"text": {
"query": "einzelhandel",
"boost": 200
}
}
},
"should": {
"match": {
"oa": {
"query": "1",
"boost": 400
}
}
}
}
}
}

And the explain result is:

  • _explanation: {
    • value: 1.4639825
    • description: sum of:
    • details: [
      • {
        • value: 1.2748994
        • description: weight(text:einzelhandel^200.0 in 188102) [
          PerFieldSimilarity], result of:
        • details: [
          • {
            • value: 1.2748994
            • description: score(doc=188102,freq=1.0 = termFreq=
              1.0 ), product of:
            • details: [
              • {
                • value: 0.98196113
                • description: queryWeight, product of:
                • details: [
                  • {
                    • value: 200
                    • description: boost
                      }
                  • {
                    • value: 10.386557
                    • description: idf(docFreq=50, maxDocs=
                  }
                  • {
                    • value: 0.00047270773
                    • description: queryNorm
                      }
                      ]
                      }
              • {
                • value: 1.2983196
                • description: fieldWeight in 188102, product
                  of:
                • details: [
                  • {
                    • value: 1
                    • description: tf(freq=1.0), with freq
                      of:
                    • details: [
                      • {
                        • value: 1
                        • description: termFreq=1.0
                          }
                          ]
                          }
                  • {
                    • value: 10.386557
                    • description: idf(docFreq=50, maxDocs=
                  }
                  • {
                    • value: 0.125
                    • description: fieldNorm(doc=188102)
                      }
                      ]
                      }
                      ]
                      }
                      ]
                      }
      • {
        • value: 0.1890831
        • description: ConstantScore(oa:[1 TO 1]^400.0)^400.0,
          product of:
        • details: [
          • {
            • value: 400
            • description: boost
              }
          • {
            • value: 0.00047270773
            • description: queryNorm
              }
              ]
              }
              ]
              }

}

The top part looks good to me, but why is ES building from
"should(oa:1^400)" something
like "ConstantScore(oa:[1 TO 1]^400.0)^400.0" which looks like a
range query with double boost of 400?

Is that right?

Regards

Bernd

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/45f94ead-b37c-4c13-9bd2-8833cd8387da%40goo
glegroups.com
https://groups.google.com/d/msgid/elasticsearch/45f94ead-b37c-4c13-9bd2-8833cd8387da%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/6c9242d7-e865-420d-b520-cad2dd4597fd%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/6c9242d7-e865-420d-b520-cad2dd4597fd%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2cfe1ddc-f15e-46b2-9b65-759e5f3bdcce%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2cfe1ddc-f15e-46b2-9b65-759e5f3bdcce%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHo2tr7fJq9PrMCEwauunxHnRA1rbiT64PjaafH3r%3D6bg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Bernd Fehling-2) #7

Currently I'm playing with Solr 4.6.1 and ES 1.0.2 which both use Lucene
4.6.1.
The field content of "oa" has very low cardinality, actually only one of
the values 0,1 or 2.
Also, in Solr I have omitNorms=true because I don't want any index-time
boost or anything else, and the precisionStep is zero.
Belief me, it works like a charm for years now with Solr and all is 100
percent compliant to Lucene, the problem is Elasticsearch.

I just wanted to implement boost-query to my ES interface as it is in Solr
for years. For example the boost should be if oa=1.
I don't know why I should deal with huge function score query if I just
want an extra boost during the query (selectable by the user).

It seams like ES is not 100 Percent Lucene conform because it is not using
omitNorms=true on numeric fields :frowning:
The issue you mentioned is years ago and also fixed.

Nevertheless the boosting problem of ES is somewhere in the QueryParsers
which transforms the result of QueryBuilders
to a Lucene query.

Am Montag, 4. August 2014 18:57:11 UTC+2 schrieb Jörg Prante:

Because integer fields have no norms, it is quite uncommon to use them for
boosting. More common is the use for interpreting integer values as input
for scoring algorithm with function score.

Which Solr version is this? Solr did not follow the Lucene default in
previous versions regarding integer boosting
https://issues.apache.org/jira/browse/SOLR-3140

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/87c21bab-25b6-4ffd-9d9d-1f35dcf658e4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Bernd Fehling-2) #8

After using MatchQuery and also trying TermQuery and FunctionScoreQuery I
can say that Elasticsearch
always produces NumericRangeQuery for numeric fields with boosting (or
even for all numeric queries ???).
fieldName = oa , value = 1 , boost = 400
E.g. "oa":"1" --> "oa":[1 to 1]

As for MatchQuery and TermQuery the boosting is doubled.
E.g. "oa:1^400" --> ConstantScore(oa:[1 TO 1]^400.0)^400.0

The FunctionScoreQuery also produces a NumericRangeQuery but can handle the
boost.
E.g. "oa:1^400" --> ConstantScore(oa:[1 TO 1]) with "value" : 400.0,
"static boost factor"

By the way, this is how a working boosted numeric query should be (like in
Solr):
"oa:1^400" --> (MATCH) weight(oa:`\b\u0000\u0000\u0000\u0001^400.0 in ...)

Regards
Bernd

Am Dienstag, 5. August 2014 09:19:18 UTC+2 schrieb Bernd Fehling:

Currently I'm playing with Solr 4.6.1 and ES 1.0.2 which both use Lucene
4.6.1.
The field content of "oa" has very low cardinality, actually only one of
the values 0,1 or 2.
Also, in Solr I have omitNorms=true because I don't want any index-time
boost or anything else, and the precisionStep is zero.
Belief me, it works like a charm for years now with Solr and all is 100
percent compliant to Lucene, the problem is Elasticsearch.

I just wanted to implement boost-query to my ES interface as it is in Solr
for years. For example the boost should be if oa=1.
I don't know why I should deal with huge function score query if I just
want an extra boost during the query (selectable by the user).

It seams like ES is not 100 Percent Lucene conform because it is not using
omitNorms=true on numeric fields :frowning:
The issue you mentioned is years ago and also fixed.

Nevertheless the boosting problem of ES is somewhere in the QueryParsers
which transforms the result of QueryBuilders
to a Lucene query.

Am Montag, 4. August 2014 18:57:11 UTC+2 schrieb Jörg Prante:

Because integer fields have no norms, it is quite uncommon to use them
for boosting. More common is the use for interpreting integer values as
input for scoring algorithm with function score.

Which Solr version is this? Solr did not follow the Lucene default in
previous versions regarding integer boosting
https://issues.apache.org/jira/browse/SOLR-3140

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c1ade4e7-2499-4a05-9c0b-0b7eba1400a2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #9