How do I identify and remove words from a search phrase that don't exist in my elasticsearch index using suggetors?

Hello, I'm trying to get support on the following:

In summary, please assume the following for my situation:

  • I am trying to create a 'did you mean' feature using the ES Suggester feature.
  • My goal is to turn this "greem dress banana" into this "green dress", where:
    • "greem" is a badly spelt word that should be corrected to the word "green" as "green" exists in my index.
    • "dress" is a correct word and should be left alone and present in the suggestion as it exists in the index.
    • "banana" is a word/term that does not exist anywhere in my index and so should be removed completely.

What I CAN do:

  • Correct "greem" to "green" works fine.

What I cannot do:

  • Identify that "banana" does not exist in the index at all and so decide to remove it.

OK, so when I call the suggester service with the phrase "greem dress banana" like so:

{"suggest":{"text":"greem+dress+banana","correction-1":{"term":{"field":"_docs.product_name","suggest_mode":"always"}}}}

Note: I have various fields to check for suggestions as you can see above.

This returns something like this (note that my code converts this to an array, but this is just the same as the JSON returned from ES):

Aurora_API_Client_Service_Data_Json Object
(
    [json:Aurora_API_Client_Service_Data_Json:private] => Array
        (
            [took] => 8
            [timed_out] => 
            [_shards] => Array
                (
                    [total] => 1
                    [successful] => 1
                    [skipped] => 0
                    [failed] => 0
                )

            [hits] => Array
                (
                    [total] => Array
                        (
                            [value] => 0
                            [relation] => eq
                        )

                    [max_score] => 
                    [hits] => Array
                        (
                        )

                )

            [suggest] => Array
                (
                    [correction-1] => Array
                        (
                            [0] => Array
                                (
                                    [text] => greem
                                    [offset] => 0
                                    [length] => 5
                                    [options] => Array
                                        (
                                            [0] => Array
                                                (
                                                    [text] => green
                                                    [score] => 0.8
                                                    [freq] => 3
                                                )

                                            [1] => Array
                                                (
                                                    [text] => grei
                                                    [score] => 0.5
                                                    [freq] => 1
                                                )

                                        )

                                )

                            [1] => Array
                                (
                                    [text] => dress
                                    [offset] => 6
                                    [length] => 5
                                    [options] => Array
                                        (
                                        )

                                )

                            [2] => Array
                                (
                                    [text] => banana
                                    [offset] => 12
                                    [length] => 6
                                    [options] => Array
                                        (
                                        )

                                )

                        )

                )

        )

)

The kay issue I have with this return data is thus...

In the following two bits of data, I cannot tell the difference between a word that is correct and exists and so has no suggestions and a word that simply does not exist AT ALL and so has no suggestions.

                            [1] => Array
                                (
                                    [text] => dress
                                    [offset] => 6
                                    [length] => 5
                                    [options] => Array
                                        (
                                        )

                                )

                            [2] => Array
                                (
                                    [text] => banana
                                    [offset] => 12
                                    [length] => 6
                                    [options] => Array
                                        (
                                        )

                                )

I really feel like this is a missing feature, as ES must be able to tell that "dress" has no suggestions because it is correct/found and so why can it not simply denote this in it's response and save me having to do complex/additional logic (collate, additional searches for each individual word for existence, etc.).

Have I missed something? Is there a way to tell the term suggester to let me know the difference between terms without suggestions due to not existing vs. simply being correct?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.