Multi_match prefix query working weird

Hi everyone,

Don't know if i post this in the right section, sorry if i'm wrong.

Here is my case.

I use ElasticSearch to optimize search engine on a wordpress site using ElasticPress plugin as bridge.
I work with a local ElasticSearch installation (version 6.2).

To those who don't have yet fled, here my problem.

I'm working with a french database, this is my analyzer setting :

array(
    'analyzer' => array(
        'default' => array(
            'tokenizer' => 'icu_tokenizer',
            'filter' => array(
                'french_elision',
                'lowercase',
                'french_stop',
                'french_stemmer',
                'icu_folding'
            ),
            'char_filter' => array( 'html_strip' ),
            'language' => apply_filters( 'ep_analyzer_language', 'french', 'analyzer_default' ),
        )
    ),
    'filter' => array(
        'french_elision' => array(
            'type' => 'elision',
            'articles_case' => true,
            'articles' => [
                'l', 'm', 't', 'qu', 'n', 's',
                'j', 'd', 'c', 'jusqu', 'quoiqu',
                'lorsqu', 'puisqu'
            ]
        ),
        'french_stop' => array(
          'type' => 'stop',
          'stopwords' =>  '_french_'
        ),
        'french_stemmer' => array(
            'type' => 'stemmer',
            'language' => 'light_french'
        )
    ),
    'normalizer' => array(
        'lowerasciinormalizer' => array(
            'type'   => 'custom',
            'filter' => array( 'lowercase', 'icu_folding' ),
        )
    )
)

and this is the last thing i try as query :

"query": {
    "function_score": {
        "query": {
            "bool": {
                "should": [
                    {
                        "multi_match": {
                            "query": "managemen",
                            "type": "phrase",
                            "fields": [
                                "post_title",
                                "post_content",
                                "post_excerpt"
                            ],
                            "boost": 4
                        }
                    },
                    {
                        "multi_match": {
                            "query": "managemen",
                            "fields": [
                               "post_title",
                                "post_content",
                                "post_excerpt"
                            ],
                            "boost": 2,
                            "operator": "and"
                        }
                    },
                    {
                        "multi_match": {
                            "query": "managemen",
                            "fields": [
                                "post_title",
                                "post_content",
                                "post_excerpt"
                            ],
                            "boost": 2,
                            "operator": "and",
                            "type": "phrase_prefix"
                        }
                    },
                    {
                        "multi_match": {
                            "query": "managemen",
                            "fields": [
                               "post_title",
                                "post_content",
                                "post_excerpt"
                            ],
                            "boost": 2,
                            "type": "phrase_prefix"
                        }
                    },
                    {
                        "prefix": {
                            "post_title": "managemen"
                        }
                    },
                    {
                        "prefix": {
                            "post_content": "managemen"
                        }
                    },
                    {
                        "prefix": {
                            "post_excerpt": "managemen"
                        }
                    }
                ]
            }
        },
        "score_mode": "sum",
        "boost_mode": "sum"
    }
}

I know there is a lot of thing in this bool score function, it started with some simples multi_match queries to match phrase on multiple fields. And by trying to resolve my problem, i'm finishing by try every possibility (mutli_match type phrase_prefix, multiple prefix term query, ...)

So, my problem is if i search the term "manage", i will see a lot of results.
Those results are elements with fields composed of terms "manage", "manager" or "management".
If i search the term "management", i will see a lot of results.
Those results are elements with fields composed of terms "management".
If i search the term "manageme", i will see no results at all...

I don't understand why i don't have result with a multi_match phrase_prefix query types and multiple term prefixs query...

If you have any clue, please feel free to light my mind...

You should use the _analyze API to understand how your text is transformed at index time and search time.
That might give you some clues

hi dadoonet,

thanks for your advise !

I'm thinking that the answer to my problem is how indexes are done and not how i query them.
I will investigate more on this side.

If anybody have another idea, i will look at it :wink:

Thanks !

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.