A question around to get relevant content By using TF-IDF algorithm

Hi everyone

I am doing a POC on best match document should rank higher
basically, we are using the TF-IDF algorithm to rank the documents

we are using a multi_match query to find a document

here is the query:

{
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "should": [
            {
              "multi_match": {
                "query": "Leadership Development",
                "fields": [
                  "title^35",
                  "description^15",
                  "tags^55"
                ],
                "type": "phrase"
              }
            },
            {
              "multi_match": {
                "query": "Leadership Development",
                "fields": [
                  "title^25",
                  "description^5",
                  "tags^45"
                ]
              }
            }
          ]
        }
      },
      "functions": [
      ]
    }
  },
  "from": 0,
  "size": 100
}

Mappings:

{
  "tags": {
    "analyzer": "standard",
    "type": "text",
    "fields": {
      "keyword": {
        "normalizer": "lcase_keyword",
        "type": "keyword"
      }
    }
  },
  "title": {
    "analyzer": "standard",
    "store": true,
    "type": "text"
  },
  "description": {
    "analyzer": "standard",
    "store": true,
    "type": "text"
  },
  "normalizer": {
    "lcase_keyword": {
      "filter": [
        "lowercase"
      ],
      "type": "custom",
      "char_filter": []
    }
  }
}

we have tagging concept where the user can add n number of tags to a document

we support partial as well as phrase match.

TF is calculate based on the length of the field as our use case is that a document can have n number of tags. because of the high number of tags present for a document, it gives lower TF and becasue of lower TF the overall doc score is also low

so because this relevant doc is shown at the end

Is there any way we can avoid this?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.