Query string query ngrams and wildcards or fuzziness or proximity searches

adyjayex · November 22, 2017, 6:42am

Hello,

GIven the following analysis settings:

"analysis": {
      "tokenizer": {
        "ngram_tokenizer": {
          "type": "ngram",
          "min_gram": 3,
          "max_gram": 3,
          "token_chars": []
        }
      },
      "analyzer": {
        "ngram_analyzer": {
          "type": "custom",
          "tokenizer": "ngram_tokenizer",
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        }
      }
    }
  },

And fields which are based on this dynamic template:

"dynamic_templates": [
        {
          "strings": {
            "match_mapping_type": "string",
            "mapping": {
              "type": "text",
              "analyzer": "ngram_analyzer",
              "search_analyzer": "ngram_analyzer",
              "norms": false,
              "fields": {
                "keyword": {
                  "store": false,
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            }
          }
        }
      ],

When searching for ease the search analyzer creates 3-size tokens (eas, ase) and returns any document which contains them, however, when using "ease" or " ease " it retrieves relevant results with words which either contain ease (please, lease...) or phrases which contain the word ease separated by whitespaces.

My question is, is it in any way possible to combine Single character wildcard , Fuziness or Proximity Search when using either "ease" or " ease " (i.e. "ea?e" or "eaes"~ to find ease) ?

Thanks!

dadoonet · November 22, 2017, 7:05am

You can run a match query with "fuziness": "AUTO" and that should work. Did you try?

adyjayex · November 22, 2017, 7:18am

Not yet, however, apart from fuzziness, is it possible to replicate the other features available in Query String Query with Match Query ?

Thanks!

dadoonet · November 22, 2017, 8:28am

You have much more control IMO when using match queries or bool or whatever than with the query_string.

adyjayex · November 22, 2017, 2:02pm

I've stumbled upon this in the documentation

Comparison to query_string / field
The match family of queries does not go through a "query parsing" process. It does not support field name prefixes, wildcard characters, or other "advanced" features. For this reason, chances of it failing are very small / non existent, and it provides an excellent behavior when it comes to just analyze and run that text as a query behavior (which is usually what a text search box does). Also, the phrase_prefix type can provide a great "as you type" behavior to automatically load search results.

It seems like we'd have to build our own parser on top of the match queries to be able to expose the number of features the query_string queries offer to an end user.

For running only match queries it probably does a better job, but for offering various ways of querying data without essentially rewriting a query parser on top of the query to provide various use cases which query_string provide, it doesn't seem to provide much.

Or maybe I didn't fully understand your suggestion ?

Thanks!

dadoonet · November 24, 2017, 5:08pm

I understand.

IMO "normal" users should never have to think about this.

As an example I'm never ever writing a query like field:value foo~ b?r in Google or Qwant when I'm searching for something. It's counterintuitive.

What I'm expecting is that the search engine does its best to find the most relevant information for me even though I'm doing mistakes.

So I'm often combining multiple queries within a should array of the bool query.

For example, this gist shows something like this:

gist.github.com

https://gist.github.com/dadoonet/5179ee72ecbf08f12f53d4bda1b76bab

search_kibana_console.txt

### REINIT
DELETE user
PUT user
{
  "settings": {
    "number_of_shards": 1
  }, 
  "mappings": {
    "doc": {
      "properties": {

This file has been truncated. show original

But may be you really want to give that power to your users. In which case, I'd prefer simple_query_string query instead (if possible).

Now to answer to your original question, I don't know if it's doable. I did not test something like field:f?o. Did you?

adyjayex · November 25, 2017, 4:51am

Thank you for the gist!

While I agree with you that a normal user might be more comfortable with easy searches, in specific scenarios where you're looking for more complex content which you can't account for using synonym filters for example, it's very handy to have them.

Even Google provides such features.

As for field:f?o - yes, using a whitespace analyzer gives the ability to use most (if not all) of the features that Query String Querying provides.

One possible way of implementing this seems to be using one analyzer on the original field, and another analyzer with a search_quote_analyzer on a subfield, which seems to be working reliably on ES6.0. Only concern is if it's the optimum implementation or it could have been done solely with a ngram tokenizer (although our tests showed that it did not work as expected with the Query String Query syntax)

system · December 23, 2017, 4:52am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Wildcard and Fuzzy query together Elasticsearch	5	3539	November 6, 2018
Analyzer issues using query_string in ES 1.5.2? Elasticsearch	3	1169	July 5, 2017
Phrase matching using query_string on nGram analyzed data Elasticsearch	4	1620	July 6, 2017
Query_string field specific search with nGram tokenizer Elasticsearch	1	367	July 6, 2017
Wildcard searches Elasticsearch	6	1564	July 31, 2018

Query string query ngrams and wildcards or fuzziness or proximity searches

Related topics