A question around to get relevant content By using TF-IDF algorithm

Lahu_Gosavi · October 12, 2021, 7:03am

Hi everyone

I am doing a POC on best match document should rank higher
basically, we are using the TF-IDF algorithm to rank the documents

we are using a multi_match query to find a document

here is the query:

{
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "should": [
            {
              "multi_match": {
                "query": "Leadership Development",
                "fields": [
                  "title^35",
                  "description^15",
                  "tags^55"
                ],
                "type": "phrase"
              }
            },
            {
              "multi_match": {
                "query": "Leadership Development",
                "fields": [
                  "title^25",
                  "description^5",
                  "tags^45"
                ]
              }
            }
          ]
        }
      },
      "functions": [
      ]
    }
  },
  "from": 0,
  "size": 100
}

Mappings:

{
  "tags": {
    "analyzer": "standard",
    "type": "text",
    "fields": {
      "keyword": {
        "normalizer": "lcase_keyword",
        "type": "keyword"
      }
    }
  },
  "title": {
    "analyzer": "standard",
    "store": true,
    "type": "text"
  },
  "description": {
    "analyzer": "standard",
    "store": true,
    "type": "text"
  },
  "normalizer": {
    "lcase_keyword": {
      "filter": [
        "lowercase"
      ],
      "type": "custom",
      "char_filter": []
    }
  }
}

we have tagging concept where the user can add n number of tags to a document

we support partial as well as phrase match.

TF is calculate based on the length of the field as our use case is that a document can have n number of tags. because of the high number of tags present for a document, it gives lower TF and becasue of lower TF the overall doc score is also low

so because this relevant doc is shown at the end

Is there any way we can avoid this?

system · November 9, 2021, 7:04am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to increase relevancy for duplicate documents? Elasticsearch	1	560	July 25, 2018
How can I aggregate terms by their tf-idf score in elasticsearch? Elasticsearch	9	3097	July 5, 2017
Search over most frequent matches / terms without TF or IDF adjustment Elasticsearch	1	554	July 5, 2017
Compare relevance for different document types Elasticsearch	1	434	July 5, 2017
Text match scoring on multiple indexes Elasticsearch	1	372	March 6, 2020

A question around to get relevant content By using TF-IDF algorithm

Related topics