Issue when indexing french words with ^


(Alexandre Heimburger) #1

Hey

I gist my configuration and query.

The context

I use a stemmer-ngram filter to index the title field of my documents. (It
enables me to implement a super fast autocompletion btw).

I works great with french words containing é, è (i.e I can search
theorieto find documents indexed with
théorie).

But it does not work with french words containing ^.

I index the title "Pôle web 2.0" which I cannot find using "pole" term.

It seems that the n-gram tokenizer does not recognize ^ as an accent.

Any idea ?

-Alex-


(Alexandre Heimburger) #2

No idea everybody ?

On 30 août, 12:17, Alexandre Heimburger alexheimbur...@gmail.com
wrote:

Hey

I gist my configuration and query.

https://gist.github.com/211d97b1d7cd3eb1aeac

The context

I use a stemmer-ngram filter to index the title field of my documents. (It
enables me to implement a super fast autocompletion btw).

I works great with french words containing é, è (i.e I can search
theorieto find documents indexed with
théorie).

But it does not work with french words containing ^.

I index the title "Pôle web 2.0" which I cannot find using "pole" term.

It seems that the n-gram tokenizer does not recognize ^ as an accent.

Any idea ?

-Alex-


(Tomislav Poljak) #3

Hi,
I don't think n-gram tokenizer strips any accents. You need to use
ASCII Folding Token Filter
(http://www.elasticsearch.org/guide/reference/index-modules/analysis/asciifolding-tokenfilter.html)
for this, at both index and query time analysis. I've altered your AC
analysis (added "asciifolding" filter at both index and query time
analysis), check https://gist.github.com/1183687 -> I've tested with
"query": "pole" and it matches.

Hope this helps,

Tomislav

2011/8/31 alheim alexheimburger@gmail.com:

No idea everybody ?

On 30 août, 12:17, Alexandre Heimburger alexheimbur...@gmail.com
wrote:

Hey

I gist my configuration and query.

https://gist.github.com/211d97b1d7cd3eb1aeac

The context

I use a stemmer-ngram filter to index the title field of my documents. (It
enables me to implement a super fast autocompletion btw).

I works great with french words containing é, è (i.e I can search
theorieto find documents indexed with
théorie).

But it does not work with french words containing ^.

I index the title "Pôle web 2.0" which I cannot find using "pole" term.

It seems that the n-gram tokenizer does not recognize ^ as an accent.

Any idea ?

-Alex-


(Alexandre Heimburger) #4

Thanks a lot. I test tomorrow morning and I'll tell you.

On Wed, Aug 31, 2011 at 4:42 PM, Tomislav Poljak tpoljak@gmail.com wrote:

Hi,
I don't think n-gram tokenizer strips any accents. You need to use
ASCII Folding Token Filter
(
http://www.elasticsearch.org/guide/reference/index-modules/analysis/asciifolding-tokenfilter.html
)
for this, at both index and query time analysis. I've altered your AC
analysis (added "asciifolding" filter at both index and query time
analysis), check https://gist.github.com/1183687 -> I've tested with
"query": "pole" and it matches.

Hope this helps,

Tomislav

2011/8/31 alheim alexheimburger@gmail.com:

No idea everybody ?

On 30 août, 12:17, Alexandre Heimburger alexheimbur...@gmail.com
wrote:

Hey

I gist my configuration and query.

https://gist.github.com/211d97b1d7cd3eb1aeac

The context

I use a stemmer-ngram filter to index the title field of my documents.
(It

enables me to implement a super fast autocompletion btw).

I works great with french words containing é, è (i.e I can search
theorieto find documents indexed with
théorie).

But it does not work with french words containing ^.

I index the title "Pôle web 2.0" which I cannot find using "pole" term.

It seems that the n-gram tokenizer does not recognize ^ as an accent.

Any idea ?

-Alex-

--
Alexandre Heimburger
R&D Manager
blueKiwi Software
tel : +33687880997
email : ahb@bluekiwi-software.com
adress : 93 rue Vieille du Temple, 75003 Paris

What is blueKiwi? blueKiwi - the first Enterprise Social Software Suite in
the world building professional networks on conversations and relationships

  • helps large organizations increase their productivity, foster innovations
    and boost people satisfaction.

(system) #5