Items that should have equal score don't, could someone help me understand?


(Doug Livesey) #1

Hi -- I have a query that searches for items in a certain subcategory that should (not must) have been tagged with a certain tag. The items also have a listed_at field. If the query finds no items that have the desired tags, then it should just return the most recently listed items in the subcategory.
Oh, and I'm using the msearch method to do a multi-search in the Ruby ElasticModel API.
The query looks like this: https://gist.github.com/biot023/11f12b82fb7df3dee816434f35b8021f

I would expect that this would return the most recently listed items (according to the listed_at field), because I would expect all items to have the same score: they should all have the correct subcategory ID and they should none of them have the tag in the query ("snow leopard").
However, the items seem to come back in a random order, each having been assigned a seemingly random score. An example of what actually gets returned is this: https://gist.github.com/biot023/9a959631de8527ec9314e2f7ce34e2c7

(Apologies for the massive code dump, I'm not sure what would be relevant or not.)
As you can see, the scoring seems more-or-less random, which means that the listed_at field does not get to become the field sorted upon, as desired.

For completeness, the mappings of the items are as follows:

mappings do
  indexes( :title, type: "string", analyzer: "snowball" )
  indexes( :description, type: "string", analyzer: "snowball" )
  indexes( :tags, type: "string", index: "not_analyzed" )
  indexes( :section_id, type: "integer" )
  indexes( :category_id, type: "integer" )
  indexes( :subcategory_id, type: "integer" )
  indexes( :section, type: "string", index: "not_analyzed" )
  indexes( :category, type: "string", index: "not_analyzed" )
  indexes( :subcategory, type: "string", index: "not_analyzed" )
  indexes( :section_name, type: "string", analyzer: "snowball" )
  indexes( :category_name, type: "string", analyzer: "snowball" )
  indexes( :subcategory_name, type: "string", analyzer: "snowball" )
  indexes( :listed_at, type: "string", index: "not_analyzed" )
  indexes( :price, type: "float" )
  indexes( :price_range, type: "string", index: "not_analyzed" )
  indexes( :price_range_desc, type: "string", index: "not_analyzed" )
  indexes( :colour_ids, type: "integer" )
  indexes( :material_ids, type: "integer" )
  indexes( :created_at, type: "string", index: "not_analyzed" )
end

Could anyone explain how I'm ending up with a random scoring, and maybe explain how I can fix it so that the score stays consistent?
Thankyou for any and all help offered,
Doug.


(Daniel Mitterdorfer) #2

Hi @DougFolksy,

one thing that jumps out is that you map listed_at as string. Your documents all have the same score (which is ok) but then Elasticsearch sorts lexicographically by date (i.e. it treats the date as string) and I guess that's what you find so puzzling. So all you need to do is to map listed_at as date (and reindex the documents as the mapping does not apply to documents that already exist in the index).

Daniel


(Doug Livesey) #3

Thanks for that @danielmitterdorfer
Unfortunately, I get the same issue once I've changed the mapping to date and re-run my tests.
For completeness, the mappings now look like this:

mappings do
  indexes( :title, type: "string", analyzer: "snowball" )
  indexes( :description, type: "string", analyzer: "snowball" )
  indexes( :tags, type: "string", index: "not_analyzed" )
  indexes( :section_id, type: "integer" )
  indexes( :category_id, type: "integer" )
  indexes( :subcategory_id, type: "integer" )
  indexes( :section, type: "string", index: "not_analyzed" )
  indexes( :category, type: "string", index: "not_analyzed" )
  indexes( :subcategory, type: "string", index: "not_analyzed" )
  indexes( :section_name, type: "string", analyzer: "snowball" )
  indexes( :category_name, type: "string", analyzer: "snowball" )
  indexes( :subcategory_name, type: "string", analyzer: "snowball" )
  indexes( :listed_at, type: "date", index: "not_analyzed" )
  indexes( :price, type: "float" )
  indexes( :price_range, type: "string", index: "not_analyzed" )
  indexes( :price_range_desc, type: "string", index: "not_analyzed" )
  indexes( :colour_ids, type: "integer" )
  indexes( :material_ids, type: "integer" )
  indexes( :created_at, type: "date", index: "not_analyzed" )
  indexes( :expires_at, type: "date", index: "not_analyzed" )
end

I tried it without the index: "not_analyzed" bit of the date mappings, too, but that didn't seem to make a difference.


(Daniel Mitterdorfer) #4

Hi @DougFolksy,

did you recreate the index from scratch? Did you check the mapping and the documents directly in Elasticsearch?

Daniel


(Doug Livesey) #5

Yes, @danielmitterdorfer, I deleted the index and re-imported.
When I check my mappings, now, I see this for the listed_at field:

"listed_at" : {
  "type" : "date",
  "format" : "strict_date_optional_time||epoch_millis"
},

(Doug Livesey) #6

Oh, and the results show items with listed_at dates like this:

"listed_at": "2016-11-13T23:07:03.000Z"

Which is different from before, so I'm guessing that that is correct.


(Doug Livesey) #7

Ah!
I think I might know what the cause is!
The items that get lower scores than the others now seem to be ones with more tags.
Could it be that they get a lower score because they have two tags that don't match the tag in the query? Whereas the other items that only have one tag that doesn't match get a higher score?
In effect, the items are getting penalised for every tag they have which doesn't match the one in the query.
Does that seem likely?


(Doug Livesey) #8

Ah. No.
That's not it.
They're shifting around randomly, again.
Sorry -- got a bit over-excited, there! :frowning:


(Doug Livesey) #9

Just to confirm (sorry, I'm spamming aren't I?), the different items are getting different scores each time I run my tests -- here an example from two consecutive runs:

Item    -- Score 1    -- Score 2
-----------------------------------
Item 01 -- 0.09848769 -- 0.04500804
Item 02 -- 0.28986934 -- 0.09848769
Item 03 -- 0.09848769 -- 0.09848769
Item 04 -- 0.28986934 -- 0.09848769
Item 05 -- 0.04500804 -- 0.04500804
Item 06 -- 0.04500804 -- 0.09848769

For each test run, the things that will differ (because its a different time, or they've been generated differently) will be:

  • listed_at and other date fields ending in _at
  • _id and id, as part of new test data being generated for the test
  • user_id
  • shop_id
  • section_id
  • category_id
  • subcategory_id
  • primary_image (to some random UUID)
  • section, category, and subcategory, as these are fields composited from their IDs, slugs and names

But the only fields referenced in the query are listed_at and tags (an array of strings).
So I can't help thinking that it must still be something to do with the listed_at field, as that's the only relevant thing that changes from test to test.


(Daniel Mitterdorfer) #10

Hi @DougFolksy,

no worries. :slight_smile:

You can use the explain API to find out how Elasticsearch calculated the score of a document.

Can you please also provide the search results as a gist? I'd like to see whether the results are now sorted by _score and then by date (but this time interpreted as date, not as string).

Regarding the random shuffling, can you please also provide the output of GET /your_index_name_here/_stats? (where you replace "your_index_name_here" with the name of your index).

Daniel


(Daniel Mitterdorfer) #11

Hi @DougFolksy,

you write:

, the things that will differ (because its a different time, or they've been generated differently) will be:

  • listed_at and other date fields ending in _at

[...]

the only fields referenced in the query are listed_at [...]

Isn't it expected that you get different scores each time if you have different values for listed_at each time?

Daniel


(Doug Livesey) #12

Sure thing, @danielmitterdorfer

The latest results are here: https://gist.github.com/biot023/a10894b54c825d8cde4fc633082813b6

And the output from the stats query is here: https://gist.github.com/biot023/3d3f4962afb4495de0180026d4126753

I'm having a look at the explain API, now, and will report back -- thanks again, man!


(Doug Livesey) #13

Isn't it expected that you get different scores each time if you have different values for listed_at each time?

No, sorry, I should've explained -- the values change each time, but they're all consistently relative to each other.
"Item 01" gets a listed_at time of 20 hours ago, "Item 02" gets one of 19 hours ago, and so on.


(Daniel Mitterdorfer) #14

Hi @DougFolksy,

ok, I checked the results. You have a very small number of documents but multiple shards and this could cause (pseudo-)issues with scoring (see the section "Relevance is broken" in the Definitive Guide). The gist of this: can you please create the index for your tests with only one primary shard, i.e.:

PUT /your_index_name
{ 
    "settings": { 
        "number_of_shards": 1 
    }
}

Daniel


(Doug Livesey) #15

When I get around to writing the epic opera based upon this thread, your character's arias are going to be so heroic!
It took me a while how to figure out how to do this with our test setup, but once figured, I've not had an inconsistent result, yet!
Thankyou so much, @danielmitterdorfer!


(Daniel Mitterdorfer) #16

Haha, glad I could help you @DougFolksy and I'm looking forward to your opera. In the meantime, enjoy your now consistent search results. :slight_smile:

Daniel


(system) #17

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.