How to use phonetic analyzer using AWS Elasticsearch?

I am currently using AWS Elasticsearch. Currently making the following query to search for the data:

{
    "query": {
        "multi_match" : {
            "query" : "murder",
            "fields" : ["content", "title^10"],
            "fuzziness" : "AUTO"
        }
    },
     "size": "10",

     "_source": [ "title", "bench", "court" ],
     "highlight": {
        "fields" : {
            "title" : {},
            "content":{}
        }
    }

}

How can I incorporate phonetic analysis into this? If possible, I would also like to incoporate synonym search into the above query

You'd need the phonetic plugin. See https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-phonetic.html

Have a look at https://www.elastic.co/guide/en/elasticsearch/reference/7.5/analysis-synonym-graph-tokenfilter.html

BTW did you look at https://www.elastic.co/cloud and https://aws.amazon.com/marketplace/pp/B01N6YCISK ?

Cloud by elastic is one way to have access to all features, all managed by us. Think about what is there yet like Security, Monitoring, Reporting, SQL, Canvas, APM, Logs UI, Infra UI, SIEM, Maps UI and what is coming next :slight_smile: ...

Using the built-in plugin in AWS - https://aws.amazon.com/about-aws/whats-new/2016/12/amazon-elasticsearch-service-now-supports-phonetic-analysis/.

Before indexing, I used this code to set up the phonetic analyzer.

PUT enpoint/courts_2
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "my_analyzer": {
            "tokenizer": "standard",
            "filter": [
              "lowercase",
              "my_metaphone"
            ]
          }
        },
        "filter": {
          "my_metaphone": {
            "type": "phonetic",
            "encoder": "metaphone",
            "replace": true
          }
        }
      }
    }
  }
}

Note: I have not downloaded it specifically as AWS has it pre-built (Check above link).
Now,
I am using this code to make the query to the endpoint -

{
    "query": {
        "multi_match" : {
            "query" : "Abhijith",
            "fields" : ["content", "title^10"],
             "analyzer": "my_analyzer"


        }
    },
     "size": "1",
     "_source": [ "title", "bench", "court" ],
     "highlight": {
        "fields" : {
            "title" : {},
            "content":{}
        }
    }

}

But I am getting zero results back. I am getting the below output:

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 0,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    }
}

I can confirm that when not using the analyzer, I am getting back hits.

When I use this code, it returns normal output though.

GET courts_2/_analyze
{
  "analyzer": "my_analyzer",
  "text": "Abhijith"
}

Response

{
    "tokens": [
        {
            "token": "ABHJ",
            "start_offset": 0,
            "end_offset": 8,
            "type": "<ALPHANUM>",
            "position": 0
        }
    ]
}

Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.

A full reproduction script will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.

Hi,
I have edited the above post with all the details. It would be great if you could look in:) @dadoonet

I'm still missing a way to reproduce the problem.
Could you share a full script as per the example provided in the link I gave you?

If I run your query I won't get any hit as nothing has been indexed.

Hi @dadoonet, I have solved the above problem already. I have a new problem, and if you could look into it that would be great.

That'd be great to add an answer to this thread with your solution so it can be shared with the community.

Here is how I did it:
I used the metaphone analyzer and used the following code to set the analyzer (Note: On AWS, I had to reindex the data:(

PUT 
{
    "settings": {
        "index": {
            "analysis": {
                "analyzer": {
                    "my_analyzer": {
                        "tokenizer": "standard",
                        "filter": [
                            "lowercase",
                            "my_metaphone"
                        ]
                    }
                },
                "filter": {
                    "my_metaphone": {
                        "type": "phonetic",
                        "encoder": "metaphone",
                        "replace": true
                    }
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "author": {
                "type": "text",
                "analyzer": "my_analyzer"
            },
            "bench": {
                "type": "text",
                "analyzer": "my_analyzer"
            },
            "citation": {
                "type": "text"
            },
            "court": {
                "type": "text"
            },
            "date": {
                "type": "text"
            },
            "id_": {
                "type": "text"
            },
            "verdict": {
                "type": "text"
            },
            "title": {
                "type": "text",
                "analyzer": "my_analyzer",
                "fields": {
                    "standard": {
                        "type": "text"
                    }
                }
            },
            "content": {
                "type": "text",
                "analyzer": "my_analyzer",
                "fields": {
                    "standard": {
                        "type": "text"
                    }
                }
            }
        }
    }
}

Now, you can query your index with the phonetic analyzer. Note: The phonetic analyzer will be the default now, to use the standard one use titile.standard (example.)

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.