Issue when indexing french words with ^

Hey

I gist my configuration and query.

The context

I use a stemmer-ngram filter to index the title field of my documents. (It
enables me to implement a super fast autocompletion btw).

I works great with french words containing é, è (i.e I can search
theorieto find documents indexed with
théorie).

But it does not work with french words containing ^.

I index the title "Pôle web 2.0" which I cannot find using "pole" term.

It seems that the n-gram tokenizer does not recognize ^ as an accent.

Any idea ?

-Alex-

No idea everybody ?

On 30 août, 12:17, Alexandre Heimburger alexheimbur...@gmail.com
wrote:

Hey

I gist my configuration and query.

Ngram indexation of french words containing ^ · GitHub

The context

I use a stemmer-ngram filter to index the title field of my documents. (It
enables me to implement a super fast autocompletion btw).

I works great with french words containing é, è (i.e I can search
theorieto find documents indexed with
théorie).

But it does not work with french words containing ^.

I index the title "Pôle web 2.0" which I cannot find using "pole" term.

It seems that the n-gram tokenizer does not recognize ^ as an accent.

Any idea ?

-Alex-

Hi,
I don't think n-gram tokenizer strips any accents. You need to use
ASCII Folding Token Filter
(Elasticsearch Platform — Find real-time answers at scale | Elastic)
for this, at both index and query time analysis. I've altered your AC
analysis (added "asciifolding" filter at both index and query time
analysis), check stripping accents in auto-complete analysis · GitHub -> I've tested with
"query": "pole" and it matches.

Hope this helps,

Tomislav

2011/8/31 alheim alexheimburger@gmail.com:

No idea everybody ?

On 30 août, 12:17, Alexandre Heimburger alexheimbur...@gmail.com
wrote:

Hey

I gist my configuration and query.

Ngram indexation of french words containing ^ · GitHub

The context

I use a stemmer-ngram filter to index the title field of my documents. (It
enables me to implement a super fast autocompletion btw).

I works great with french words containing é, è (i.e I can search
theorieto find documents indexed with
théorie).

But it does not work with french words containing ^.

I index the title "Pôle web 2.0" which I cannot find using "pole" term.

It seems that the n-gram tokenizer does not recognize ^ as an accent.

Any idea ?

-Alex-

Thanks a lot. I test tomorrow morning and I'll tell you.

On Wed, Aug 31, 2011 at 4:42 PM, Tomislav Poljak tpoljak@gmail.com wrote:

Hi,
I don't think n-gram tokenizer strips any accents. You need to use
ASCII Folding Token Filter
(
Elasticsearch Platform — Find real-time answers at scale | Elastic
)
for this, at both index and query time analysis. I've altered your AC
analysis (added "asciifolding" filter at both index and query time
analysis), check stripping accents in auto-complete analysis · GitHub -> I've tested with
"query": "pole" and it matches.

Hope this helps,

Tomislav

2011/8/31 alheim alexheimburger@gmail.com:

No idea everybody ?

On 30 août, 12:17, Alexandre Heimburger alexheimbur...@gmail.com
wrote:

Hey

I gist my configuration and query.

Ngram indexation of french words containing ^ · GitHub

The context

I use a stemmer-ngram filter to index the title field of my documents.
(It
enables me to implement a super fast autocompletion btw).

I works great with french words containing é, è (i.e I can search
theorieto find documents indexed with
théorie).

But it does not work with french words containing ^.

I index the title "Pôle web 2.0" which I cannot find using "pole" term.

It seems that the n-gram tokenizer does not recognize ^ as an accent.

Any idea ?

-Alex-

--
Alexandre Heimburger
R&D Manager
blueKiwi Software
tel : +33687880997
email : ahb@bluekiwi-software.com
adress : 93 rue Vieille du Temple, 75003 Paris

What is blueKiwi? blueKiwi - the first Enterprise Social Software Suite in
the world building professional networks on conversations and relationships

  • helps large organizations increase their productivity, foster innovations
    and boost people satisfaction.