How to design schema for boosting/ranking logic?


Let us say, I am indexing a bunch of podcasts in my Elasticsearch cluster:

      - _id (keyword)
      - email (keyword)
      - webLink (keyword)
      - rssLink (keyword)
      - shortDescription (text)
      - longDescription (text)
      - artistIds (array of integers)
      - imageLink (keyword)
      - numEpisodes (integer)

I want to submit queries to select podcasts for a text query and optionally boost score based on presence of certain fields. For e.g., I'd like to boost scores if a podcast has a link, short Description or image.

For a faster execution, should I have hasWebLink, hasShortDescription and hasImageLink fields or exists clause for these fields.

I am wondering if having separate fields and setting index=True for those would result in faster execution.


exists clause in the should part of a boolean query sounds like a good way to do this. No need for further tuning.

Thanks @spinscale for the reply.

As you mentioned, I am currently including the exists conditions with score boosting in a should clause.

According to the documentation here, a filter bitset is cached for potential reuse. If I do not explicitly define hasXYZ fields and an index on them, do my queries benefit from such filter bitset caching?


Those bitsets are only used for a filter context, however in your case the exists query is part of the scoring.