Recency boost, and boosts in general giving irrelevant responses

Kevin_Ferm · April 27, 2020, 1:20pm

Hi!

I have a dataset of about 300k articles, each one of them contains a unix timestamp (unixtimestamp) and a date string (published - "2020-04-14T03:00:00+00:00").

Now, unixtimestamp is in App Search defined as a Number, while published is defined as a Date. Nothing out of the ordinary here.

The crazy begins when I start searching and tuning the relevancy. Since every article has a date attached to it, I want that to play a large role in what article comes out on top.
So, the first thing I do is go into the "Reference UI" and sort by my property "published". Here's where things become really unbelievable, as now every single article in my dataset is picked up as being relevant, the search query has no effect at all, published is the only deciding factor.

The same story repeats itself when I try to use boost parameters of example:

  "boosts": {
"unixtimestamp": [
  {
    "type": "functional",
    "function": "linear",
    "operation": "add",
    "factor": 1e-7
  }
]
  },

This works in the sense that I get relevant responses on top - but then every single article in my dataset is still picked up, so I'll have a million pages of irrelevant articles after 40 relevant ones.

I even tried a recency boost, but that simply didn't work at all. Meaning, no matter what settings, what function, what factor I used on the "published" field, that is a DATE-field to be sure, made any difference what so ever in the score total of any article.
Example:

  "boosts": {

    "published": {

      "function": "exponential",

      "center": "now",

      "factor": 8,

      "type": "proximity"

    }

  },

But even this recency boost I don't have high hopes. There's nothing to say that it won't have exactly the same effect as sorting by published either.

So.. How do I fix this? I feel like I'd much rather get a response of 5 relevant articles only, than every single article in my dataset following them... Everything is default settings.

JasonStoltz · April 28, 2020, 12:55pm

@Kevin_Ferm

You touch on two issues here:

Our search query tends to favor recall over precision, so you can end up having a large set of search results. This is a known issue, we plan to eventually add features to let you further tune this relevance. I have no timeline for that as of now.
Recency boosts having no effect - This is a known issue that is in our backlog to fix. We used a fixed scale and decay for proximity / recency. I suspect it will work better for numbers that are closer together ... 1 vs 2 vs 3 as opposed to numbers that are far apart, 1000, 2000, 3000 ... you could try parsing out a year field, for instance, and do a proximity boost on just year.

Kevin_Ferm · April 28, 2020, 1:55pm

But why does it ignore my search string if I sort by a date field? That seems irrational in a search function.
@JasonStoltz

JasonStoltz · April 28, 2020, 3:33pm

Hey Kevin, sorry, it should not be ignoring your search string. Are you certain that is the case?

TomHome · May 15, 2020, 4:24am

Hi @JasonStoltz

Can you confirm that the latest AppSearch 7.7 has no options, nor workarounds, to discard unrelated docs / low score?

We discussed it in Dec19 and I loosely followed the releases notes but your answer to @Kevin_Ferm seems to highlight the fact that no official progress has been made on this topic.

Ref: Filter search by score

Cheers,
Thomas

system · June 12, 2020, 4:24am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Prioritize recent content by date Elastic Search elastic-app-search	4	1762	May 14, 2021
Recency Boost on date field containing nulls Elastic Search elastic-app-search	2	151	January 12, 2024
Approach for incorporating recency in search Elasticsearch	2	823	July 10, 2019
Elasticsearch - recency boost error: 'Unknown key for a START_OBJECT in [boosts].' Elasticsearch	4	1288	October 1, 2021
Recency Elasticsearch	10	353	July 6, 2017

Recency boost, and boosts in general giving irrelevant responses

Related topics