I would like search text contains of words like I'm or you're and so on using Elasticsearch. It probably should be used synonym token filter. At index time I used below code:
'analyzer' => [
'default' => [
'type' => 'custom',
'tokenizer' => 'standard',
'filter' => ['standard', 'lowercase', 'Snowball_filter', 'synonym_filter']
],
'search_analyzer' => [
'type' => 'custom',
'tokenizer' => 'standard',
'filter' => ['standard', 'lowercase', 'Snowball_filter', 'synonym_filter']
]
],
'filter' => [
'Snowball_filter' => [
'type' => 'snowball',
'language' => 'English'
],
"synonym_filter" => [
"type" => "synonym",
"synonyms" => ["I'm,I am => I'm ","you're , you are => you're"]
]
]
Now, suppose I have four documents:
1- I am a student.
2- I'm a student.
3- there are students.
4- Am I a student?
And I use query like below:
[
"match_phrase" => [
"body" => [
"query" => $q,
"analyzer" => 'search_analyzer',
"boost" => 100.0
]
]
],
[
"match_phrase" => [
"body" => [
"query" => $q,
"analyzer" => 'search_analyzer',
"slop" => "1",
"boost" => 45.0
]
]
],
[
"match" => [
"body" => [
"query" => $q,
"analyzer" => 'search_analyzer',
"operator" => "and",
"boost" => 10.0
]
]
],
[
"match" => [
"body" => [
"query" => $q,
"analyzer" => 'search_analyzer',
"operator" => "or",
"fuzziness" => "1",
"boost" => 1.0
]
]
]
When I search I'm a student the result is:
1- I am a student.
2- I'm a student.
3- there are students.
4- am i a student?
While I expected the result in order:
1- I'm a student.
2- I am a student.
3- am i a student?
4- there are students.
Please help me to understand this type of search with contraction words to get better relevance. Thanks in advance.