Problem: I can not find the correct settings (ES5) to query a term as "prodigo" and return all occurencies of "prodigo" and "pródigo", word which has an accent.
Enviornment:
I am upgrading from ES 2.4 to ES 5.0. In my current cenário this is my mapping:
'properties' => [
'file' => [
'type' => 'attachment',
'fields' => [
'content' => [
'type' => 'string',
'term_vector' => 'with_positions_offsets',
'store' => true,
'analyzer' => 'brazilian'
]
]
],
'book_name' => [
'type' => 'string',
'analyzer' => 'brazilian'
],
'book_author' => [
'type' => 'string',
'analyzer' => 'brazilian'
],
'book_editor' => [
'type' => 'string'
],
'url' => [
'type' => 'string'
]
]
This settings does a little trick for me today, when looking for a term without accent it searches for all the words in Portuguese universe which could have accent as well.
So, in my new settings for ES 5.0 I have:
'properties' => [
'name' => [
'type' => 'text',
'analyzer' => 'brazilian'
],
'author' => [
'type' => 'text',
'analyzer' => 'brazilian'
],
'editor' => [
'type' => 'text',
'analyzer' => 'brazilian'
],
'url' => [
'type' => 'text'
],
'content' => [
'type' => 'text',
'analyzer' => 'brazilian',
'term_vector' => 'with_positions_offsets',
'store' => true
]
]
But it´s not doing the trick anymore... I have read a lot on the docs, and unfortunately, the page which has found the previous solutions has not been updated yet.
So, I have tried these:
'analyzer' => [
'brazilian' => [
'tokenizer' => 'standard',
'filter' => [
'standard',
'lowercase',
'asciifolding'
]
]
]
With some variations as well... a lot variations... and still could not make it work. Also tried the Language Analyzers for Brazilian Portuguese and still not getting it solved.
The processor I am using to ingest my pdf files content is:
{
"description": "Extract attachment information",
"processors": {
{
"attachment": {
"field": "content",
"indexed_chars": -1
}
},
}
}
My query is like this:
'query' => [
'match_phrase' => [
'content' => [
'query' => '(MY_SEARCH_STRING - EX. prodigo)',
'slop' => 15
]
]
]
Any help will be appreciated.
P.S.: All codes are in Array formats because I am using PHP Client.