Word with accent and searching


(erlo) #1

Hello,

I'm stuck with a problem about words which contain accent.

For example, if I index the word "année" with a custom analyzer with
filter "asciifolding", I can match the token by searching "annee", but I
cannot match when I search directly "année".

I've set a search_analyzer which is a custom one with asciifolding too.

What is the best pratice to index accent word, and be able to search it
using accent or not...?

Thanks!

--
Erwan


(erlo) #2

I can answer to myself...
Even if I cannot explain exactly my mistake I've specified a specific
analyzer (with asciifolding filter) in my query and it works.

But I've already define this analyzer in "search_analyzer" at the top
level of the mapping...

Erwan Loaëc wrote:

Hello,

I'm stuck with a problem about words which contain accent.

For example, if I index the word "année" with a custom analyzer with
filter "asciifolding", I can match the token by searching "annee", but I
cannot match when I search directly "année".

I've set a search_analyzer which is a custom one with asciifolding too.

What is the best pratice to index accent word, and be able to search it
using accent or not...?

Thanks!


(Clinton Gormley) #3

On Thu, 2011-09-29 at 14:38 +0200, Erwan Loaëc wrote:

Hello,

I'm stuck with a problem about words which contain accent.

For example, if I index the word "année" with a custom analyzer with
filter "asciifolding", I can match the token by searching "annee", but I
cannot match when I search directly "année".

I've set a search_analyzer which is a custom one with asciifolding too.

How are you searching? Are you searching against the same field for
which you have specified the analyzer? Or are you using the query_string
query with no fields specified? If the latter, then you are searching
against the _all field, which has its own analyzer (the 'standard'
analyzer by default).

clint


(erlo) #4

Here is how I've defined the mapping:

 "myindex" : {
     "index_analyzer" : "myindexanalyzer",
     "search_analyzer" : "mysearchanalyzer",
     "properties" : {
         "mykeyword": { "type" : "string" }
     }
 }

Now, I'm searching with:

"query":{
"query_string":{
"mykeyword":{
"query":"XXX"
}
}
}

Other test:

"query":{
"query_string":{
"query":"XXX"
}
}

I does not use the right analyzer... A working solution is to specify
the analyzer :

"query":{
"query_string":{
"mykeyword":{
"query":"XXX",
"analyzer":"mysearchanalyzer",
}
}
}

Is anyone know why my mapping does not word as excepted ...?
(search_analyzer property...)

Thanks!

Clinton Gormley wrote:

On Thu, 2011-09-29 at 14:38 +0200, Erwan Loaëc wrote:

Hello,

I'm stuck with a problem about words which contain accent.

For example, if I index the word "année" with a custom analyzer with
filter "asciifolding", I can match the token by searching "annee", but I
cannot match when I search directly "année".

I've set a search_analyzer which is a custom one with asciifolding too.

How are you searching? Are you searching against the same field for
which you have specified the analyzer? Or are you using the query_string
query with no fields specified? If the latter, then you are searching
against the _all field, which has its own analyzer (the 'standard'
analyzer by default).

clint


(Shay Banon) #5

Post a recreation, currently the question is all over the place. See
http://www.elasticsearch.org/help.

On Thu, Sep 29, 2011 at 5:43 PM, Erwan Loaëc erwan.loaec@cgin.fr wrote:

Here is how I've defined the mapping:

"myindex" : {
"index_analyzer" : "myindexanalyzer",
"search_analyzer" : "mysearchanalyzer",
"properties" : {
"mykeyword": { "type" : "string" }
}
}

Now, I'm searching with:

"query":{
"query_string":{
"mykeyword":{
"query":"XXX"
}
}
}

Other test:

"query":{
"query_string":{
"query":"XXX"
}
}

I does not use the right analyzer... A working solution is to specify the
analyzer :

"query":{
"query_string":{
"mykeyword":{
"query":"XXX",
"analyzer":"mysearchanalyzer",
}
}
}

Is anyone know why my mapping does not word as excepted ...?
(search_analyzer property...)

Thanks!

Clinton Gormley wrote:

On Thu, 2011-09-29 at 14:38 +0200, Erwan Loaëc wrote:

Hello,

I'm stuck with a problem about words which contain accent.

For example, if I index the word "année" with a custom analyzer with
filter "asciifolding", I can match the token by searching "annee", but I
cannot match when I search directly "année".

I've set a search_analyzer which is a custom one with asciifolding too.

How are you searching? Are you searching against the same field for
which you have specified the analyzer? Or are you using the query_string
query with no fields specified? If the latter, then you are searching
against the _all field, which has its own analyzer (the 'standard'
analyzer by default).

clint


(system) #6