Dealing with highly scored sequences of words

I have an elastic search filter which for the sake of argument looks like this

"french_company_synonyms": {
                            "expand": "true",
                            "type": "synonym_graph",
                            "synonyms": [
                                "llp, limited liability partnership",
                                "llc, limited liability company",
                                "plc, public limited company",
                                "sarl, societe a responsabilite limitee",
                                "sa, societe anonyme"

As a result, when someone types in "sarl", it expands to "societe a responsabilite limitee". So far so good.
Now, in my index I have records that use both "societe a responsabilite limitee" and "sarl". Let's say I'm looking for "sarl dream". In my db it's stored exactly like that, "sarl dream". However, even though there is an exact match, instead it would first return LOTS of "Societe a responsabilite limitee *" companies, because I suspect it expands the query, sees that there are records that match 4 words, and scores those higher than the exact match "sarl dream" (only 2 words match).

If I'm understanding the problem correctly, I'd formalise it something like this. I have many records in the index with the same combination of words ("societe a responsabilite limitee"). Usually ES is good at penalising words that appear often in the index through tf/idf. However in this case this seems to be offset by the fact that it's a phrase with multiple words. And even though they are common, the matches still score high. How do you deal with the cases where it's not simply a word that's very common in a db but a phrase/word sequence?

Now I see 2 potential solutions. I can maybe use some kind of downscoring thing where I downscore the matches that have a phrase "Societe a responsabilite limitee" (I think I need to use boosting query for that). This is relatively easy but seems a bit dirty, I think the boosting score would need to change as the index grows. Another way is to ensure that my index doesn't have any "societe a responsabilite limitee", and all these are normalised to a word "sarl").

Before I go down the rabbit hole of trying things out, can someone tell me if they encountered a similar problem, and also whether my problem definition is even correct?

Thank you.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.