Confused about when and how asciifolding happens

Chase · May 15, 2011, 4:46am

Hello again,

In short, I'm trying to set up ElasticSearch so that it always sees
accented characters as a fantastic opportunity for asciifolding.

I've added asciifolding to the default analyzer.

If somebody searches for "Diaz", it should match:

"Diaz", and
"Díaz"

If somebody searches for "Díaz", it should match:
3) "Diaz", and
4) "Díaz"

In reality, I get #1 and #2, but not #3 or, quite surprisingly, #4.
Why not?

See https://gist.github.com/972878

In fact, even with default settings, "Díaz" does not seem to match
"Díaz". Maybe this is a problem with the Mac OS X Terminal I'm using?

A related question I have is that if I create a custom analyzer with
asciifolding and specify it in a mapping, but I don't have
asciifolding in the default analyzer, then I only get the benefit of
asciifolding when doing searches on specific fields. In the above
example, if asciifolding was enabled for the name field in its
mapping, then the query "name:diaz" would match "Cameron Díaz", but a
query of "diaz" would not. I sort of understand this as a design
choice, but at the same time, it would be nice (in a principle-of-
least-surprise way) if the filters on a field were always active, no
matter how you're searching it. Intuitively, I assumed that when
asciifolding was turned on, it would tokenize "Díaz" as "diaz" -- but
evidently, not so? Should I be doing something differently?

Thanks in advance,
-Chase

Topic		Replies	Views
Match queries and ASCII folding Elasticsearch	2	393	December 20, 2022
Adding asciifolding in default analyzer? Elasticsearch	7	657	July 6, 2017
Index analyzer problem with accent! Elasticsearch	1	337	July 6, 2017
Question about asciifolding filter Elasticsearch	3	549	July 6, 2017
ASCII Folding not working Elasticsearch	1	1372	July 5, 2017

Confused about when and how asciifolding happens

Related topics