Recency boost, and boosts in general giving irrelevant responses

Hi!

I have a dataset of about 300k articles, each one of them contains a unix timestamp (unixtimestamp) and a date string (published - "2020-04-14T03:00:00+00:00").

Now, unixtimestamp is in App Search defined as a Number, while published is defined as a Date. Nothing out of the ordinary here.

The crazy begins when I start searching and tuning the relevancy. Since every article has a date attached to it, I want that to play a large role in what article comes out on top.
So, the first thing I do is go into the "Reference UI" and sort by my property "published". Here's where things become really unbelievable, as now every single article in my dataset is picked up as being relevant, the search query has no effect at all, published is the only deciding factor.

The same story repeats itself when I try to use boost parameters of example:

  "boosts": {
"unixtimestamp": [
  {
    "type": "functional",
    "function": "linear",
    "operation": "add",
    "factor": 1e-7
  }
]
  },

This works in the sense that I get relevant responses on top - but then every single article in my dataset is still picked up, so I'll have a million pages of irrelevant articles after 40 relevant ones.

I even tried a recency boost, but that simply didn't work at all. Meaning, no matter what settings, what function, what factor I used on the "published" field, that is a DATE-field to be sure, made any difference what so ever in the score total of any article.
Example:

  "boosts": {

    "published": {

      "function": "exponential",

      "center": "now",

      "factor": 8,

      "type": "proximity"

    }

  },

But even this recency boost I don't have high hopes. There's nothing to say that it won't have exactly the same effect as sorting by published either.

So.. How do I fix this? I feel like I'd much rather get a response of 5 relevant articles only, than every single article in my dataset following them... Everything is default settings.

@Kevin_Ferm

You touch on two issues here:

  1. Our search query tends to favor recall over precision, so you can end up having a large set of search results. This is a known issue, we plan to eventually add features to let you further tune this relevance. I have no timeline for that as of now.

  2. Recency boosts having no effect - This is a known issue that is in our backlog to fix. We used a fixed scale and decay for proximity / recency. I suspect it will work better for numbers that are closer together ... 1 vs 2 vs 3 as opposed to numbers that are far apart, 1000, 2000, 3000 ... you could try parsing out a year field, for instance, and do a proximity boost on just year.

But why does it ignore my search string if I sort by a date field? That seems irrational in a search function.
@JasonStoltz

Hey Kevin, sorry, it should not be ignoring your search string. Are you certain that is the case?

Hi @JasonStoltz

Can you confirm that the latest AppSearch 7.7 has no options, nor workarounds, to discard unrelated docs / low score?

We discussed it in Dec19 and I loosely followed the releases notes but your answer to @Kevin_Ferm seems to highlight the fact that no official progress has been made on this topic.

Ref: Filter search by score

Cheers,
Thomas

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.