Does using phonetic analyzer using .json file kills the effect of Fuzzy Search

satadru_biswas · April 22, 2024, 2:03pm

So created an entity class.

@Data
@Document(indexName = "ABC")
@Setting(settingPath = "analyzer/search-analyzer.json")
@JsonIgnoreProperties(ignoreUnknown = true)
@Slf4j
@EqualsAndHashCode
public class XYZ  {
@Field(type = FieldType.Text,analyzer = "my_analyzer")
    private String qwr;
}

when querying using

QueryBuilders.matchQuery(qwr, partySearch.getqwr()).fuzziness(Fuzziness.AUTO)

Fuzziness is not working as used phonetic analyzer at startup.

dadoonet · April 22, 2024, 2:28pm

Welcome!

What is the mapping and index settings being generated by your code?

Basically what is the output of the following command (meant to be run from the Kibana Dev Console):

GET /ABC

satadru_biswas · April 22, 2024, 5:41pm

I am using phonetic analyzer  .json file   

search-analyzer.json   

{
  "analysis": {
    "analyzer": {
      "my_analyzer": {
        "tokenizer": "standard",
        "filter": [
          "lowercase",
          "my_metaphone"

        ]
      }
    },
    "filter": {
      "my_metaphone": {
        "type": "phonetic",
        "encoder": "double_metaphone",
        "replace": false
      }
    }
  }
}

And using the below Entity class and the analyzer.





import com.fasterxml.jackson.annotation.JsonIgnoreProperties;
import com.fasterxml.jackson.annotation.JsonProperty;
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.EqualsAndHashCode;
import lombok.NoArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.data.elasticsearch.annotations.Document;
import org.springframework.data.elasticsearch.annotations.Field;
import org.springframework.data.elasticsearch.annotations.FieldType;
import org.springframework.data.elasticsearch.annotations.Setting;
import javax.persistence.GeneratedValue;
import javax.persistence.GenerationType;

@AllArgsConstructor
@NoArgsConstructor
@Data
@Document(indexName = "ABC")
@Setting(settingPath = "analyzer/search-analyzer.json")
@JsonIgnoreProperties(ignoreUnknown = true)
@Slf4j
@EqualsAndHashCode
public class XYZ {


    @GeneratedValue(strategy = GenerationType.SEQUENCE)
    private Integer id;



    @Field(type = FieldType.Text,analyzer = "my_analyzer")
    private String firstName;

    @Field(type = FieldType.Keyword)
    private String middleName;

  
    @Field(type = FieldType.Text,analyzer = "my_analyzer")
    private String name;

 
}


In the BoolQueryBuilder   again using combination like the below


 if(IsNotEmptyOrNull(Search.getFirstName())){
            BoolQueryBuilder   boolQueryCombined = QueryBuilders.boolQuery();
            boolQueryCombined.should(QueryBuilders.matchQuery(FIRSTNAME, Search.getFirstName()).fuzziness(Fuzziness.AUTO));
            boolQueryCombined.should(QueryBuilders.wildcardQuery(FIRSTNAME, Search.getFirstName().toLowerCase().concat("*")));
            boolQueryCombined.minimumShouldMatch(1);
            boolQueryBuilder.must(boolQueryCombined);
        }

where Search is a object containing all search properties  like  firstName,name ...

Then Fuzzy Auto not working  only phonetic match happening.   phonetic is overriding the fuzziness?

Will be using EdgeNgram Analyzer instead of Wildcard Query later.

dadoonet · April 22, 2024, 8:02pm

Could you share the mapping as well?

satadru_biswas · April 23, 2024, 3:27am

Basically  using the below is bringing only phonetically match value fuzziness(Fuzziness.AUTO)  not working if I use phonetic analyzer json file as mentioned above.

boolQueryCombined.should(QueryBuilders.matchQuery(FIRSTNAME, partySearch.getFirstName()).fuzziness(Fuzziness.AUTO));
            boolQueryCombined.should(QueryBuilders.wildcardQuery(FIRSTNAME, partySearch.getFirstName().toLowerCase().concat("*")));


Data Inserted
anju
manju
 
 
Phonetic Result for double_metaphone  search with manju
manju (Not working if we use phonetic analyzer)
 
Fuzzy Result for double_metaphone search with manju
anju 
manju

dadoonet · April 23, 2024, 5:12am

What is the output of the _analyzer API with your texts?

That's the way to check how the analyzer behaves at index time and search time.

If you need more help, it would help a lot to create a minimal reproduction script we can easily copy/paste into the dev console and iterate from there.

This page describes it: Elastic Stack and Solutions Help · Forums and Slack | Elastic

satadru_biswas · April 23, 2024, 6:41am

http://localhost:9200/xyz/_analyze

input : 
{
  "tokenizer": "standard",
  "filter": [
    "lowercase",
    "my_metaphone"
  ],
  "text": "manju"
}

output: 

{
    "tokens": [
        {
            "token": "manju",
            "start_offset": 0,
            "end_offset": 5,
            "type": "<ALPHANUM>",
            "position": 0
        },
        {
            "token": "MNJ",
            "start_offset": 0,
            "end_offset": 5,
            "type": "<ALPHANUM>",
            "position": 0
        }
    ]
}


Analyzer working fine with encoder like double_metaphone,soundex :+1:

But this same thing when combined with fuzziness  the fuzziness is not working in the bool query  is the question ???.

Both  Fuzzy and phonetic  match would be there together.    Phonetic behavior through 
encoders and Fuzzy through bool query as mentioned in the above comments is expected.

dadoonet · April 23, 2024, 12:17pm

Could you share the output of the analyzer in both cases?

satadru_biswas · April 23, 2024, 1:02pm

I am not using analyzer in the second table its plain bool query with getFirstName()).fuzziness(Fuzziness.AUTO)  where fuzziness is working fine but in the 1st table I need both phonetic(analyzer) and fuzzy, where fuzziness is not working.

1st Table  (normal bool query with analyzer)

{
"query": {
  "bool" : {
    "must" : [
      {
        "bool" : {
          "should" : [
            {
              "match" : {
                "firstName" : {
                  "query" : "manju",
                  "operator" : "OR",
                  "fuzziness" : "AUTO",
                  "prefix_length" : 0,
                  "max_expansions" : 50,
                  "fuzzy_transpositions" : true,
                  "lenient" : false,
                  "zero_terms_query" : "NONE",
                  "auto_generate_synonyms_phrase_query" : true,
                  "boost" : 1.0
                }
              }
            },
            {
              "wildcard" : {
                "firstName" : {
                  "wildcard" : "manju*",
                  "boost" : 1.0
                }
              }
            }
          ],
          "adjust_pure_negative" : true,
          "minimum_should_match" : "1",
          "boost" : 1.0
        }
      }
    ],
    "must_not" : [
      {
        "terms" : {
          "CNumber" : [
            "146"
          ],
          "boost" : 1.0
        }
      }
    ],
    "adjust_pure_negative" : true,
    "boost" : 1.0
  }
  }
}




Output for fuzziness for the second table

"firstName": "anju",
"firstName": "manju"


Normal Fuzziness of a table can be done without Analyzers right?

dadoonet · April 23, 2024, 1:33pm

It would help to reproduce with a minimal script so we can really understand what the problem is and what you are doing.

I am not using analyzer in the second table its plain bool query with getFirstName()).fuzziness(Fuzziness.AUTO) where fuzziness is working fine but in the 1st table I need both phonetic(analyzer) and fuzzy, where fuzziness is not working.

Well... As you are querying firstName, I think the analyzer of this field is applied.

satadru_biswas · April 23, 2024, 2:44pm

Ok let me break it down.  
 Two table are there one with phonetic analyzer and one without phonetic analyzer . 
1) Table XYZ (Phonetic)
2) Table ABC


XYZ  Table with install analysis-phonetic   already done for phonetic match.
ABC Table not analyzed.



For  Table  XYZ(Phonetic + Fuzzy ) is the requirement  and this fuzzy is not working only Phonetic working.

@AllArgsConstructor
@NoArgsConstructor
@Data
@Document(indexName = "XYZ ")
@Setting(settingPath = "analyzer/search-analyzer.json")
@JsonIgnoreProperties(ignoreUnknown = true)
@Slf4j
@EqualsAndHashCode
public class XYZ {

    @GeneratedValue(strategy = GenerationType.SEQUENCE)
    private Integer id;
    @Field(type = FieldType.Text,analyzer = "my_analyzer")
    private String firstName;
    @Field(type = FieldType.Text,analyzer = "my_analyzer")
    private String name;

}
  search-analyzer.json
{
  "analysis": {
    "analyzer": {
      "my_analyzer": {
        "tokenizer": "standard",
        "filter": [
          "lowercase",
          "my_metaphone"

        ]
      }
    },
    "filter": {
      "my_metaphone": {
        "type": "phonetic",
        "encoder": "double_metaphone",
        "replace": false
      }
    }
  }
}


Using Bool QueryBuilder    I writing java code to build query and hit XYZ Table where 
fuzziness(Fuzziness.AUTO) is not working on phonetic behaviour is working.

 BoolQueryBuilder   boolQueryCombined = QueryBuilders.boolQuery();
            boolQueryCombined.should(QueryBuilders.matchQuery(FIRSTNAME, Search.getFirstName()).fuzziness(Fuzziness.AUTO));
            boolQueryCombined.should(QueryBuilders.wildcardQuery(FIRSTNAME, Search.getFirstName().toLowerCase().concat("*")));
            boolQueryCombined.minimumShouldMatch(1);
            boolQueryBuilder.must(boolQueryCombined);



For  Table  ABC ( Fuzzy ) is the requirement  and this is working 

@AllArgsConstructor
@NoArgsConstructor
@Data
@Document(indexName = "ABC")
@JsonIgnoreProperties(ignoreUnknown = true)
@Slf4j
@EqualsAndHashCode
public class ABC {


    @GeneratedValue(strategy = GenerationType.SEQUENCE)
    private Integer id;

    @Field(type = FieldType.Text")
    private String firstName;
    @Field(type = FieldType.Text")
    private String name;

}

Using Bool QueryBuilder    I writing java code to build query and hit ABC Table where 
fuzziness(Fuzziness.AUTO) is working.  Same code as above.

 BoolQueryBuilder   boolQueryCombined = QueryBuilders.boolQuery();
            boolQueryCombined.should(QueryBuilders.matchQuery(FIRSTNAME, Search.getFirstName()).fuzziness(Fuzziness.AUTO));
            boolQueryCombined.should(QueryBuilders.wildcardQuery(FIRSTNAME, Search.getFirstName().toLowerCase().concat("*")));
            boolQueryCombined.minimumShouldMatch(1);
            boolQueryBuilder.must(boolQueryCombined);



Data Inserted for XYZ,ABC Table
anju
manju
 
 
Phonetic Result for double_metaphone  search with manju as search criteria in XYZ Table
manju
 
Fuzzy Result for double_metaphone search with manju as search criteria  in ABC Table
anju 
manju


***XYZ Table should give same result as ABC as in both case  fuzziness(Fuzziness.AUTO)) 
 is given ***

****** I think a if we putting phonetic analyzer  it not considering that column for fuzzy search. ******

Topic		Replies	Views
Search with phonetic plugin, Elasticsearch-7.6.1 Elasticsearch	5	467	April 15, 2020
ELK 6.6 + Analysis Phonetic 6.6 Elasticsearch	4	414	March 8, 2019
Analyzer from plugin works well when _analyze is called but does not work in search Elasticsearch	1	362	December 4, 2018
Analysis-phonetic throws ClassNotFoundException[org.elasticsearch.index.analysis.phonetic.PhoneticTokenFilterFactory] Elasticsearch	2	869	July 6, 2017
Got no results with phonetic plugin search Elasticsearch	2	664	July 5, 2017

Does using phonetic analyzer using .json file kills the effect of Fuzzy Search

Related topics