How to design schema for boosting/ranking logic?


Let us say, I am indexing a bunch of podcasts in my Elasticsearch cluster:

      - _id (keyword)
      - email (keyword)
      - webLink (keyword)
      - rssLink (keyword)
      - shortDescription (text)
      - longDescription (text)
      - artistIds (array of integers)
      - imageLink (keyword)
      - numEpisodes (integer)

I want to submit queries to select podcasts for a text query and optionally boost score based on presence of certain fields. For e.g., I'd like to boost scores if a podcast has a link, short Description or image.

For a faster execution, should I have hasWebLink, hasShortDescription and hasImageLink fields or exists clause for these fields.

I am wondering if having separate fields and setting index=True for those would result in faster execution.


exists clause in the should part of a boolean query sounds like a good way to do this. No need for further tuning.

Thanks @spinscale for the reply.

As you mentioned, I am currently including the exists conditions with score boosting in a should clause.

According to the documentation here, a filter bitset is cached for potential reuse. If I do not explicitly define hasXYZ fields and an index on them, do my queries benefit from such filter bitset caching?


Those bitsets are only used for a filter context, however in your case the exists query is part of the scoring.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.