Query string query ngrams and wildcards or fuzziness or proximity searches

Hello,

GIven the following analysis settings:

"analysis": {
      "tokenizer": {
        "ngram_tokenizer": {
          "type": "ngram",
          "min_gram": 3,
          "max_gram": 3,
          "token_chars": []
        }
      },
      "analyzer": {
        "ngram_analyzer": {
          "type": "custom",
          "tokenizer": "ngram_tokenizer",
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        }
      }
    }
  },

And fields which are based on this dynamic template:

"dynamic_templates": [
        {
          "strings": {
            "match_mapping_type": "string",
            "mapping": {
              "type": "text",
              "analyzer": "ngram_analyzer",
              "search_analyzer": "ngram_analyzer",
              "norms": false,
              "fields": {
                "keyword": {
                  "store": false,
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            }
          }
        }
      ],

When searching for ease the search analyzer creates 3-size tokens (eas, ase) and returns any document which contains them, however, when using "ease" or " ease " it retrieves relevant results with words which either contain ease (please, lease...) or phrases which contain the word ease separated by whitespaces.

My question is, is it in any way possible to combine Single character wildcard , Fuziness or Proximity Search when using either "ease" or " ease " (i.e. "ea?e" or "eaes"~ to find ease) ?

Thanks!

You can run a match query with "fuziness": "AUTO" and that should work. Did you try?

Not yet, however, apart from fuzziness, is it possible to replicate the other features available in Query String Query with Match Query ?

Thanks!

You have much more control IMO when using match queries or bool or whatever than with the query_string.

I've stumbled upon this in the documentation

Comparison to query_string / field
The match family of queries does not go through a "query parsing" process. It does not support field name prefixes, wildcard characters, or other "advanced" features. For this reason, chances of it failing are very small / non existent, and it provides an excellent behavior when it comes to just analyze and run that text as a query behavior (which is usually what a text search box does). Also, the phrase_prefix type can provide a great "as you type" behavior to automatically load search results.

It seems like we'd have to build our own parser on top of the match queries to be able to expose the number of features the query_string queries offer to an end user.

For running only match queries it probably does a better job, but for offering various ways of querying data without essentially rewriting a query parser on top of the query to provide various use cases which query_string provide, it doesn't seem to provide much.

Or maybe I didn't fully understand your suggestion ?

Thanks!

I understand.

IMO "normal" users should never have to think about this.

As an example I'm never ever writing a query like field:value foo~ b?r in Google or Qwant when I'm searching for something. It's counterintuitive.

What I'm expecting is that the search engine does its best to find the most relevant information for me even though I'm doing mistakes.

So I'm often combining multiple queries within a should array of the bool query.

For example, this gist shows something like this:

But may be you really want to give that power to your users. In which case, I'd prefer simple_query_string query instead (if possible).

Now to answer to your original question, I don't know if it's doable. I did not test something like field:f?o. Did you?

Thank you for the gist!

While I agree with you that a normal user might be more comfortable with easy searches, in specific scenarios where you're looking for more complex content which you can't account for using synonym filters for example, it's very handy to have them.

Even Google provides such features.

As for field:f?o - yes, using a whitespace analyzer gives the ability to use most (if not all) of the features that Query String Querying provides.

One possible way of implementing this seems to be using one analyzer on the original field, and another analyzer with a search_quote_analyzer on a subfield, which seems to be working reliably on ES6.0. Only concern is if it's the optimum implementation or it could have been done solely with a ngram tokenizer (although our tests showed that it did not work as expected with the Query String Query syntax)

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.