Metaphone analyzer being too ambigous

Hi,
I am currently using the Metaphone analyzer and it is acting too ambiguous. For example, here is the result for Murder using the _analyze API.

{
    "tokens": [
        {
            "token": "MRTR",
            "start_offset": 0,
            "end_offset": 8,
            "type": "<ALPHANUM>",
            "position": 0
        }
    ]
}

Now, If I search for Mehtrotra, the result is same, although the phonetics (pronounciations )of both are radically different. How do I make do of this?

Here is the settings I used while setting up the index:

{
    "settings": {
        "index": {
            "analysis": {
                "analyzer": {
                    "my_analyzer": {
                        "tokenizer": "standard",
                        "filter": [
                            "lowercase",
                            "my_metaphone"
                        ]
                    }
                },
                "filter": {
                    "my_metaphone": {
                        "type": "phonetic",
                        "encoder": "metaphone",
                        "replace": true
                    }
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "author": {
                "type": "text",
                "analyzer": "my_analyzer"
            },
            "bench": {
                "type": "text",
                "analyzer": "my_analyzer"
            },
            "citation": {
                "type": "text"
            },
            "court": {
                "type": "text"
            },
            "date": {
                "type": "text"
            },
            "id_": {
                "type": "text"
            },
            "verdict": {
                "type": "text"
            },
            "title": {
                "type": "text",
                "analyzer": "my_analyzer",
                "fields": {
                    "standard": {
                        "type": "text"
                    }
                }
            },
            "content": {
                "type": "text",
                "analyzer": "my_analyzer",
                "fields": {
                    "standard": {
                        "type": "text"
                    }
                }
            }
        }
    }
}

Thanks,

Can someone please answer this? Thanks.

Read this and specifically the "Also be patient" part.

It's fine to answer on your own thread after 2 or 3 days (not including weekends) if you don't have an answer.

Will remember.

False positives are an inevitable result of using phonetic indexing.
Phonetic algorithms aim to improve recall while suffering a loss in precision.
It's always a trade-off.

If you want ranking to prefer exact matches to sounds-like matches then index your content as both normal text tokens and phonetic tokens using multi-fields. Then in your searches uses a bool query and 2 should clauses -one for matching the normal text field (using exact tokens) and one for the fuzzier phonetic field. Documents which match both clauses will appear first in the results

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.