Hi!
I have a dataset of about 300k articles, each one of them contains a unix timestamp (unixtimestamp) and a date string (published - "2020-04-14T03:00:00+00:00").
Now, unixtimestamp is in App Search defined as a Number, while published is defined as a Date. Nothing out of the ordinary here.
The crazy begins when I start searching and tuning the relevancy. Since every article has a date attached to it, I want that to play a large role in what article comes out on top.
So, the first thing I do is go into the "Reference UI" and sort by my property "published". Here's where things become really unbelievable, as now every single article in my dataset is picked up as being relevant, the search query has no effect at all, published is the only deciding factor.
The same story repeats itself when I try to use boost parameters of example:
"boosts": { "unixtimestamp": [ { "type": "functional", "function": "linear", "operation": "add", "factor": 1e-7 } ] },
This works in the sense that I get relevant responses on top - but then every single article in my dataset is still picked up, so I'll have a million pages of irrelevant articles after 40 relevant ones.
I even tried a recency boost, but that simply didn't work at all. Meaning, no matter what settings, what function, what factor I used on the "published" field, that is a DATE-field to be sure, made any difference what so ever in the score total of any article.
Example:
"boosts": { "published": { "function": "exponential", "center": "now", "factor": 8, "type": "proximity" } },
But even this recency boost I don't have high hopes. There's nothing to say that it won't have exactly the same effect as sorting by published either.
So.. How do I fix this? I feel like I'd much rather get a response of 5 relevant articles only, than every single article in my dataset following them... Everything is default settings.