Exact term count search help needed

I need to be able to find exact text matches, not raw matches so I can take advantage of fuzzy matching. For example, consider these documents:

doc1.name = "cheese pizza"
doc2.name = "kid's cheese pizza"
doc3.name = "pizza"
doc4.name = "cheese pizza deluxe"
doc5.name = "cheese only pizza"

A standard query for "pizza" would return all of those documents scored, but not return the single document that's needed, doc3 in this case. If I search for "cheese pizza", I need doc1 to be returned.

I found this link to be helpful in that it discusses the idea of storing a second field (perhaps "nameCount") with the number of terms in the searched field.

Is there a way to have ES compute the number of terms (after stop words) during document insertion, and then also compute the number of terms of the query to account for same stop words, assume using a script?

Can someone recommend a better approach or share links to solutions? Thanks!

Elasticsearch supports text analysis over string fields. There are a couple of supported types for string fields, that are text & keyword. If your use case requires you to search on the basis of exact matches, I would recommend you to change it to keyword for your index templates.

You can find some more details about it over below mentioned article.
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html

@mjunaidmuzammil Thanks. The problem with this is that I cannot rely on exact character matches, spelled exactly. Perhaps I should have clarified, we need exact analyzed matches of the inverted index from a full-text field. Example, if someone queries for "pizzas", "Pissa" or "piza", they should also match doc3 only, allowing for fuzzy matching. Is this impossible?

This has something to do with the different analysers. Perhaps you should explore the different analysers and see if there is one that matches your requirements. In case, there isn't any analyser that matches your requirements, you can try creating a custom analyser as well. (https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-custom-analyzer.html)

I'm so sorry, I'm was still not being clear as I could have been. My wording of "exact" is imprecise. I'm looking for perfect matches, meaning that all analyzed terms I query for must be the only terms in the full-text field. Perhaps another qualification would be equal matches. And I do not want to store name fields as keyword types, only text.

Analyzed terms for the example documents above would yield these inverted index entries:

doc1.name = ["cheese", "pizza"]
doc2.name = ["kid", "cheese", "pizza"]
doc3.name = ["pizza"]
doc4.name = ["cheese", "pizza", "deluxe"]
doc5.name = ["cheese", "only", "pizza"]

If the query is "pizza", then the only match can be doc3: ["pizza"].

If the query is "cheese pizza", the only match can be doc1: ["cheese", "pizza"].
I hope this makes sense.

I then want to boost these perfect matches to 1000, say, to indicate these are identical term set matches. I've scoured the ES documentation, and don't see any way to direct ES to consider this type of matching.

@Teddis You can benefit from the multi field feature, in which you can support both text analysis and exact keyword search functionality. For more details, you can take a look at https://www.elastic.co/guide/en/elasticsearch/reference/master/multi-fields.html. Hope this helps you out.

I guess I'm looking for a match query that doesn't exist. I'm already using multi-fields in various capacities, but I think you may be right.

I seem to have solved it this way but it doesn't work on stems:

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "name": {
          "type": "text",
          "fields": {
            "raw": {
              "type": "keyword",
              "doc_values": true,
              "normalizer": "lowercase_normalizer"
            }
          },
          "analyzer": "english"
        }
      }
    }
  },
  "settings": {
    "analysis": {
      "normalizer": {
        "lowercase_normalizer": {
          "type": "custom",
          "char_filter": [],
          "filter": [
            "lowercase"
          ]
        }
      }
    }
  }
}
PUT my_index/_doc/1
{
  "name": "cheese pizza"
}
PUT my_index/_doc/2
{
  "name": "kid's cheese pizza"
}
PUT my_index/_doc/3
{
  "name": "pizza"
}
PUT my_index/_doc/4
{
  "name": "cheese only pizza"
}
GET my_index/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "name.raw": {
              "query": "cheese only pizza!",
              "fuzziness": "AUTO",
              "prefix_length": 0,
              "boost": 10000
            }
          }
        }
      ]
    }
  }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.