Fuzzy in searchs with asciifolding


(José Victor Da Silva) #1

Good afternoon,
I'm trying to use a "custom analyzer" called test_fuzzy in a fuzzy search, but it does not work when I insert the "preserve_original" option to "true" in the "asciifolding" filter.

When I create a "custom_analyzer" setting the "preserve_original" as false, then the search returns results correctly.

I saw in elastic documentation that fuzziness would be applied in each term (after analysis), Does anyone know the elastic reason not being able to find my documents even though there are more tokens (more options) using "preserve_original" as true?

The following is when preserve_original is active (test_fuzzy):

{
  "tokens": [
    {
      "token": "produto",
      "start_offset": 0,
      "end_offset": 7,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "varzacao",
      "start_offset": 8,
      "end_offset": 16,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "varzação",
      "start_offset": 8,
      "end_offset": 16,
      "type": "<ALPHANUM>",
      "position": 1
    }
  ]
}

The following is when preserve_original is disabled (test_fuzzy):

{
  "tokens": [
    {
      "token": "produto",
      "start_offset": 0,
      "end_offset": 7,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "varzacao",
      "start_offset": 8,
      "end_offset": 16,
      "type": "<ALPHANUM>",
      "position": 1
    }
  ]
}

Here is the query executed:

[
                'match' => [
                    'name.fuzzy' => [
                        'query' => 'produto varzação',
                        'operator' => 'and',
                        'boost' => 2,
                        'zero_terms_query' => 'all',
                        'fuzziness' => 'auto'                        ]
                ]
            ]

Follow the mapping :

'name' =>
                    [
                        'type' => 'text',
                        'analyzer' => 'standard',
                        'fields' => [
                            'norm' => [
                                'type' => 'keyword',
                                'normalizer' => 'keyword_text'
                            ],
                            'stemmed' => [
                                'type' => 'text',
                                'analyzer' => 'stemmed'
                            ],
                            'fuzzy' => [
                                'type' => 'text',
                                'analyzer' => 'test_fuzzy'
                            ]
                        ],
                    ],

Follow the parser and filter:

 'test_fuzzy' => [
        'tokenizer' => 'standard',
        'filter' => [
            'lowercase',
            'custom_asciifolding',
        ]
 ],

 'filter' => [
        'custom_asciifolding' => [
            'type' => 'asciifolding',
            'preserve_original' => true
        ],

(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.