Take account of repeat words (duplicate words)

grefon · May 23, 2019, 5:07pm

Tell me, how can uchitvat number of occurrences of the term in the search?

Index

{
  "mappings": {
    "properties": {
      "text": {
        "type": "text",
        "analyzer": "my_analyzer"
      }
    }
  },
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "tokenizer": "whitespace",
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        }
      }
    }
  }
}

Docs:

ID 1: {"text": "foo"}
ID 2: {"text": "test foo repeat foo"}

Request:

{
  "size": 5,
  "query": {
    "multi_match": {
      "fields": [
        "text"
      ],
      "query": "foo foo",
      "analyzer": "whitespace",
      "minimum_should_match": "100%",
      "operator": "and"
    }
  },
  "explain": true
}

Result:

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 0.48326197,
    "hits": [
      {
        "_index": "test",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.48326197,
        "_source": {
          "text": "foo"
        }
      },
      {
        "_index": "test",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.4289919,
        "_source": {
          "text": "test foo repeat foo"
        }
      }
    ]
  }
}

The word "foo" is in both documents.
In the request, it occurs 2 times - then document ID 2 should have more _score, since the word is also repeated twice in it.

The above example is simplified. I use similarity = boolean, ngramm analyzer, fuzzy and more complex queries. But the situation is similar: elasticsearch does not handle word repetitions (duplicates). If the term is in the document, then it is taken into account for all the searched words.

Ideally, it would disable the terms that have already given search results so that the following search words are not matched with this term ))

It would be possible to solve the problem through "Scripted similarity" but it is forbidden!

{
  "settings": {
    "similarity": {
      "search_similarity" : {
        "type": "scripted",
        "script": {
          "source": "return query.boost / doc.freq;"
        }
      }
    }
..................................

system · June 20, 2019, 5:07pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Search if only the all word is repeated more than 1 times in a particular string Elasticsearch	1	295	April 20, 2022
Skip Scoring more than once if search term appears multiple time in Document Elasticsearch	2	330	September 15, 2020
Exception when there are duplicate words in search query Elasticsearch	2	1680	March 14, 2018
How to filter search query by repeated words? Elasticsearch	5	411	May 26, 2022
Match query returns different number of partial results each time it is run Elasticsearch	3	1500	July 5, 2017

Take account of repeat words (duplicate words)

Index

Docs:

Request:

Result:

Related topics