How to use phonetic analyzer using AWS Elasticsearch?

abhijith_chandran · February 5, 2020, 2:30pm

I am currently using AWS Elasticsearch. Currently making the following query to search for the data:

{
    "query": {
        "multi_match" : {
            "query" : "murder",
            "fields" : ["content", "title^10"],
            "fuzziness" : "AUTO"
        }
    },
     "size": "10",

     "_source": [ "title", "bench", "court" ],
     "highlight": {
        "fields" : {
            "title" : {},
            "content":{}
        }
    }

}

How can I incorporate phonetic analysis into this? If possible, I would also like to incoporate synonym search into the above query

dadoonet · February 5, 2020, 3:36pm

You'd need the phonetic plugin. See Phonetic analysis plugin | Elasticsearch Plugins and Integrations [8.11] | Elastic

Have a look at Synonym graph token filter | Elasticsearch Guide [7.5] | Elastic

BTW did you look at Elastic Cloud: Hosted Elasticsearch, Hosted Search | Elastic and https://aws.amazon.com/marketplace/pp/B01N6YCISK ?

Cloud by elastic is one way to have access to all features, all managed by us. Think about what is there yet like Security, Monitoring, Reporting, SQL, Canvas, APM, Logs UI, Infra UI, SIEM, Maps UI and what is coming next ...

abhijith_chandran · February 5, 2020, 4:44pm

Using the built-in plugin in AWS - https://aws.amazon.com/about-aws/whats-new/2016/12/amazon-elasticsearch-service-now-supports-phonetic-analysis/.

Before indexing, I used this code to set up the phonetic analyzer.

PUT enpoint/courts_2
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "my_analyzer": {
            "tokenizer": "standard",
            "filter": [
              "lowercase",
              "my_metaphone"
            ]
          }
        },
        "filter": {
          "my_metaphone": {
            "type": "phonetic",
            "encoder": "metaphone",
            "replace": true
          }
        }
      }
    }
  }
}

Note: I have not downloaded it specifically as AWS has it pre-built (Check above link).
Now,
I am using this code to make the query to the endpoint -

{
    "query": {
        "multi_match" : {
            "query" : "Abhijith",
            "fields" : ["content", "title^10"],
             "analyzer": "my_analyzer"


        }
    },
     "size": "1",
     "_source": [ "title", "bench", "court" ],
     "highlight": {
        "fields" : {
            "title" : {},
            "content":{}
        }
    }

}

But I am getting zero results back. I am getting the below output:

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 0,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    }
}

I can confirm that when not using the analyzer, I am getting back hits.

When I use this code, it returns normal output though.

GET courts_2/_analyze
{
  "analyzer": "my_analyzer",
  "text": "Abhijith"
}

Response

{
    "tokens": [
        {
            "token": "ABHJ",
            "start_offset": 0,
            "end_offset": 8,
            "type": "<ALPHANUM>",
            "position": 0
        }
    ]
}

dadoonet · February 5, 2020, 4:57pm

Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.

A full reproduction script will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.

abhijith_chandran · February 5, 2020, 5:04pm

Hi,
I have edited the above post with all the details. It would be great if you could look in:) @dadoonet

dadoonet · February 27, 2020, 9:33am

I'm still missing a way to reproduce the problem.
Could you share a full script as per the example provided in the link I gave you?

If I run your query I won't get any hit as nothing has been indexed.

abhijith_chandran · February 27, 2020, 1:07pm

Hi @dadoonet, I have solved the above problem already. I have a new problem, and if you could look into it that would be great.

dadoonet · February 27, 2020, 1:19pm

That'd be great to add an answer to this thread with your solution so it can be shared with the community.

abhijith_chandran · February 29, 2020, 10:46am

Here is how I did it:
I used the metaphone analyzer and used the following code to set the analyzer (Note: On AWS, I had to reindex the data:(

PUT 
{
    "settings": {
        "index": {
            "analysis": {
                "analyzer": {
                    "my_analyzer": {
                        "tokenizer": "standard",
                        "filter": [
                            "lowercase",
                            "my_metaphone"
                        ]
                    }
                },
                "filter": {
                    "my_metaphone": {
                        "type": "phonetic",
                        "encoder": "metaphone",
                        "replace": true
                    }
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "author": {
                "type": "text",
                "analyzer": "my_analyzer"
            },
            "bench": {
                "type": "text",
                "analyzer": "my_analyzer"
            },
            "citation": {
                "type": "text"
            },
            "court": {
                "type": "text"
            },
            "date": {
                "type": "text"
            },
            "id_": {
                "type": "text"
            },
            "verdict": {
                "type": "text"
            },
            "title": {
                "type": "text",
                "analyzer": "my_analyzer",
                "fields": {
                    "standard": {
                        "type": "text"
                    }
                }
            },
            "content": {
                "type": "text",
                "analyzer": "my_analyzer",
                "fields": {
                    "standard": {
                        "type": "text"
                    }
                }
            }
        }
    }
}

Now, you can query your index with the phonetic analyzer. Note: The phonetic analyzer will be the default now, to use the standard one use titile.standard (example.)

system · March 28, 2020, 10:46am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Phonetic Analyzer returning no results Elasticsearch	2	679	February 5, 2020
Search with phonetic plugin, Elasticsearch-7.6.1 Elasticsearch	5	467	April 15, 2020
Trouble with plugins - phonetic, etc Elasticsearch	4	915	July 5, 2017
Phrase suggest with analysis-phonetic plugin in Elasticsearch 5.5 Elasticsearch	1	452	August 20, 2018
[Ann] ElasticSearch Phonetic Analysis Plugin Elasticsearch	3	469	July 6, 2017

How to use phonetic analyzer using AWS Elasticsearch?

Related topics