Are there case studies about using intermediate queries in a search?

I want to allow my users to filter their search results by various tags.

Imagine a WordPress like website where each blog post has a set of tags.

When my users go to the advanced search page, they can select a number of tags to filter the results by. Pages that are not assigned those tags will not be returned in the results.

Right now I have two tables and two indexes:

Table A — the page content (blog post in plain text)

Table B — the tag name and a reference back to Table A (i.e. many to many)

When I do a search (I have preliminary code that already works with Elassandra), I first search Table B for the tags. This is really fast since I can use a term { ... } search. I get a set of references to pages in Table A. (say one UUID per page)

Next, I do a second search against Table A. This time I use two types of searches:

  1. A query string with the text the user entered.
  2. One term { ... } per result I found in the first search.

I think that all of that works as expected as it is. I could not see how to do a join otherwise (there are some problems with Elassandra, maybe it will become possible one day with my schema...)

Now, when I search Table B, I get results and each result has a { _score: ... }, which are ignored when I do the next search. Reading the documentation, I saw that we could use the { boost: ... } parameter to tweak scores "manually".

What I'm wondering is whether there is a standard way to handle this scenario? I would be interested by existing research documents in that realm if you know of such. Or maybe just a good ol' blog post about filtering in a similar way as mine.

My current idea would be to change the term by adding the boost parameter in there:

"term": { "<field-name>": { "value": "<some-UUID", "boost": "<Table B Search _score>" } }

At the same time, I know of the filter from the bool query and if I were to add my term queries in there, I know that the score would be ignored. There may be a reason why it was done that way, i.e. to help with cases like mine?

Thank you.

The only way you can do this is by tracking the score in your client and then passing that to the second query. Elasticsearch cannot do this for you.

1 Like

Okay, that's what I was thinking. Thank you for the fast answer!

Is the _score result a one to one equivalent to the boost parameters?

Well they are both numbers, so you can just use that as the boost, sure. That's where it ends.

Thing is though, unless your index never changes, then your scores will never be static.

1 Like

Do you know of any document/discussion about how the _score gets calculated?

Thank you.

We use the BM25 algorithm for scoring (at least, if you run 6.X). There's tonnes out there on that, but I am not sure what you're expecting so it's hard to suggest something.

1 Like

That looks great. I'll be checking into this further to see that I use the data correctly. For those interested there is the Wikipedia article:

I also found a page about boosting by popularity on which may be of interest. This shows how one can use a vote score to boost the scores of results within reason. A similar solution could be used to modify the scores of the second query from the scores of the first query, opposed to using the boost parameter which is quite linear used on its own and may not be appropriate in your situation.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.