Odd scoring behavior


(Atul Sudhalkar) #1

I’m creating an index via the following:

{"mappings": {"Student": {"properties": {"skills": {"type":"text","analyzer":"english"}}}}}

Then I’m adding three documents to it:

{"index":{"_id":"unr_nedra_north"}}
{"id":"unr_nedra_north","fullName":"Nedra North","skills":"python;java;react;html;front-end development;databases","idAsString":"unr_nedra_north"}
{"index":{"_id":"unr_neha_gurusiddaiah"}}
{"id":"unr_neha_gurusiddaiah","fullName":"Neha Gurusiddaiah","skills":"python;java;react;html;front-end development;databases","idAsString":"unr_neha_gurusiddaiah"}
{"index":{"_id":"wsu_ward_wild"}}
{"id":"wsu_ward_wild","fullName":"Ward Wild","skills":"python;java;react;html;front-end development;databases","idAsString":"wsu_ward_wild"}

As you can see, all three documents have identical “skills” field values. And yet, when I search the index for keyword “java” with this query:

{
"explain": true,
"query": {"match":{"skills" : "java"}}
}

I get different scores for each hit! I can't paste the full results with explanation due to character limits, but the scores are 0.2876821, 0.18232156 and 0.18232156 respectively. Can anyone point me to what I'm doing wrong? I'm NOT using any custom scoring.


(Christian Dahlqvist) #2

How many shards do the index have? Are the two documents with the same score by any chance located in the same shard?


(Atul Sudhalkar) #3

Many thanks for the quick response!

Only one shard—the index has only 12 docs. We’re still in the pre-alpha phase of our product development. I can give you access to the index, if you want to take a look…

--atul

Atul Sudhalkar
Principal Architect


(Christian Dahlqvist) #4

Preparing a minimal recreating script is generally recommended. I ran the following, and had all 3 documents return the same score on Elasticsearch 6.2.2:

PUT test
{
  "settings" : {
      "number_of_shards" : 1
  },
  "mappings": {
    "Student": {
      "properties": {
        "skills": {
          "type":"text",
          "analyzer":"english"
        }
      }
    }
  }
}

PUT test/Student/unr_nedra_north
{"id":"unr_nedra_north","fullName":"Nedra North","skills":"python;java;react;html;front-end development;databases","idAsString":"unr_nedra_north"}

PUT test/Student/unr_neha_gurusiddaiah
{"id":"unr_neha_gurusiddaiah","fullName":"Neha Gurusiddaiah","skills":"python;java;react;html;front-end development;databases","idAsString":"unr_neha_gurusiddaiah"}

PUT test/Student/wsu_ward_wild
{"id":"wsu_ward_wild","fullName":"Ward Wild","skills":"python;java;react;html;front-end development;databases","idAsString":"wsu_ward_wild"}

POST test/_refresh

GET test/_search
{
  "explain": true,
  "query": {"match":{"skills" : "java"}}
}

(Atul Sudhalkar) #5
  1. Point taken: in future, I'll prepare a script like you posted above.
  2. I re-created the index with one shard (as you suggested), and that worked!
  3. As you might imagine, I first re-created my problem with minimal fields so I could focus questions on this forum. When I tried the single-shard index create statement in my full app, it worked there as well. But if I use 2 or any higher number of shards, it gets weird. I take it that's a feature, not a bug? Any way to make this more robust across multiple shards?

(Atul Sudhalkar) #6

I forgot to add: thanks a ton for the prompt help!


(Christian Dahlqvist) #7

This part of the docs may help.


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.