Terms with fuzzy operator don't use mapping analyzer, but always standard analyzer


(Fabien Baligand) #1

Hi,

fuzzy operator is a really powerful feature for full-text search.

But I have a problem :
When I do a fuzzy search (ex: accenTTué~1), I see that analyzer used is standard analyzer (which has only a lowercase filter).

I mean : if in my mapping, I define a custom analyzer with asciifolding filter, I can clearly see that it is not used for my search terms.
However, I see that lowercase filter is used.
That's why it seems that, by default, elasticsearch uses always standard analyzer to analyze search terms with fuzzy operator.

So my question is : is there a way to say elasticsearch to use the mapping analyzer when fuzzy operator is used ?
It can be a parameter in the query or a settings/mapping configuration in the index.

Thanks in advance for your help,
Fabien


(Lee Hinman) #2

Can you share the entire query you're using? Are you using query_string or
simple_query_string?

I mean : if in my mapping, I define a custom analyzer with asciifolding filter, I can clearly see that it is not used for my search terms.However, I see that lowercase filter is used.That's why it seems that, by default, elasticsearch uses always standard analyzer to analyze search terms with fuzzy operator.

If you are doing a query_string query with no field specified, then by default
you're searching the _all field, which may not have the analyzer you suspect.

So my question is : is there a way to say elasticsearch to use the mapping analyzer when fuzzy operator is used ?It can be a parameter in the query or a settings/mapping configuration in the index.

A better way to do this would be to use a match query with "fuzziness"
https://www.elastic.co/guide/en/elasticsearch/reference/5.1/query-dsl-match-query.html#query-dsl-match-query-fuzziness
Which should use the analyzer specific to whatever field you're searching.

If you can share more information about your documents and full query, I can try
to reproduce.


(Fabien Baligand) #3

Hi @dakrone,

First, thank for your help !

Then, to answer your questions :

  • I am using query_string, but I reproduce the problem also with simple_query_string
  • I don't search in _all but in a specific field which has a custom analyzer
  • I tried with match query. Indeed, it works fine. Thanks for the advice
  • However, my final query need is complex : multiple or/and conditions with parenthesis, search in different fields with different fuzziness levels (0,1,2) and different boosts. That's why I need for query_string query.

So that you can reproduce the query, here's all the details.

Here's my index :

PUT search_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        }
      }
    }
  },
  "mappings": {
    "search_type": {
      "properties": {
        "text": {
          "type": "text",
          "analyzer": "my_analyzer"
        }
      }
    }
  }
}

Here's my document which should be found :

PUT search_index/search_type/1
{
  "text": "accentue"
}

And here's my query with fuzziness :

GET search_index/_search
{
  "query": {
    "query_string": {
      "query": "text:accenTTué~1"
    }
  }
}

Thanks in advance for your help !


(Fabien Baligand) #4

Hi @dakrone
Have you test this configuration ?
Should I open an issue in elasticsearch github about that ?


(Lee Hinman) #5

Hi @fbaligand sorry this has taken some time. I have been researching this. It looks like this is a known issue, if you check out https://github.com/elastic/elasticsearch/issues/15760, specifically, Option 2 mentions analyzing fuzzy terms.

I believe that in your case the fuzzy term is not being analyzed at all, which is why it doesn't match.


(Fabien Baligand) #6

Hi @dakrone,

This is not as simple as not analyzed.
I mean : if I search text:accenTTue~1 (without prominent character), my document is found.
So it means that, at least, lowercase filter is applied.

Anyway, I'm very surprised this does not work, as for wildcards, you can activate analyze on wildcard terms using "analyze_wildcard": true and it works.
And to me, this is really more complicated/special to analyze wildcard terms, than fuzzy terms.

So, your final answer is that, presently, this doesn't work and there is no option to activate analyze on fuzzy terms (like "analyze_fuzzy": true) ?


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.