Problem using custom analyzer in mapping field


(Ivo Sanchez Checa Crosato) #1

Hi,

I'm trying to make a prefix query on a field that can hold non-ascii
spanish characters (i.e áéíóúñ)

I've made a custom analyzer and have added it to a field in my mapping by
doing:

create index

curl -XPUT 'elasticsearch:9200/my_new_index' -d '{
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": ["asciifolding", "lowercase"]
}
}
}
}
}
}'

create mapping

curl -XPUT 'elasticsearch:9200/my_new_index/my_mapping/_mapping' -d '{
"properties": {
"guid": {"type": "string", "index": "no"},
"name": {
"type": "string",
"index": "analyzed",
"analyzer": "my_custom_analyzer"
}
}
}'

I can confirm the analyzer is working fine since I've tried it directly
via the Analyzer API by running:

curl -XGET
'elasticsearch:9200/my_new_index/_analyze?analyzer=my_custom_analyzer' -d
'áéíóú'

{
"tokens" : [ {
"token" : "aeiou",
"start_offset" : 0,
"end_offset" : 5,
"type" : "word",
"position" : 1
} ]
}

But for some reason, the queries on the mapping are not using the analyzer,
namely the following query. If I understand correctly it should match
documents starting with both "aei" and "áéí", but it only matches the
latter case.

curl -XPOST 'elasticsearch:9200/my_new_index/my_mapping/_search?pretty=true' -d '{
"query": {
"prefix": {
"name" : {
"prefix": "áéí"
}
}
}
}'

What I'm I missing? A full, runnable, example showing the problem can be
found here: https://gist.github.com/ivoscc/6518829

Thanks a lot for your help!

Ivo

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Adrien Grand) #2

Hi,

Prefix queries[1] are not analyzed. You need to pass the prefix as it is
expected to be stored in the index so that your example works. For example,
you can run:

curl -XPOST 'elasticsearch:9200/my_new_index/my_mapping/_search?pretty=true'
-d '{
"query": {
"prefix": {
"name" : {
"prefix": "aei"
}
}
}
}'

[1] http://www.elasticsearch.org/guide/reference/query-dsl/prefix-query/

On Wed, Sep 11, 2013 at 5:23 AM, Ivo Sanchez Checa Crosato <ivoscc@gmail.com

wrote:

Hi,

I'm trying to make a prefix query on a field that can hold non-ascii
spanish characters (i.e áéíóúñ)

I've made a custom analyzer and have added it to a field in my mapping by
doing:

create index

curl -XPUT 'elasticsearch:9200/my_new_index' -d '{
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": ["asciifolding", "lowercase"]
}
}
}
}
}
}'

create mapping

curl -XPUT 'elasticsearch:9200/my_new_index/my_mapping/_mapping' -d '{
"properties": {
"guid": {"type": "string", "index": "no"},
"name": {
"type": "string",
"index": "analyzed",
"analyzer": "my_custom_analyzer"
}
}
}'

I can confirm the analyzer is working fine since I've tried it directly
via the Analyzer API by running:

curl -XGET
'elasticsearch:9200/my_new_index/_analyze?analyzer=my_custom_analyzer' -d
'áéíóú'

{
"tokens" : [ {
"token" : "aeiou",
"start_offset" : 0,
"end_offset" : 5,
"type" : "word",
"position" : 1
} ]
}

But for some reason, the queries on the mapping are not using the
analyzer, namely the following query. If I understand correctly it should
match documents starting with both "aei" and "áéí", but it only matches the
latter case.

curl -XPOST 'elasticsearch:9200/my_new_index/my_mapping/_search?pretty=true' -d '{
"query": {
"prefix": {
"name" : {
"prefix": "áéí"
}
}
}
}'

What I'm I missing? A full, runnable, example showing the problem can be
found here: https://gist.github.com/ivoscc/6518829

Thanks a lot for your help!

Ivo

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Britta Weber) #3

Hi,

I think you were missing a line when you put the mapping. Try like this:

create mapping

curl -XPUT 'elasticsearch:9200/my_new_index/my_mapping/_mapping' -d '{
"my_mapping": {
"properties": {
"guid": {"type": "string", "index": "no"},
"name": {
"type": "string",
"index": "analyzed",
"analyzer": "my_custom_analyzer"
}
}
}
}'

Your example works for me with this modification.

Cheers,
Britta

On Wed, Sep 11, 2013 at 5:23 AM, Ivo Sanchez Checa Crosato
ivoscc@gmail.com wrote:

Hi,

I'm trying to make a prefix query on a field that can hold non-ascii spanish
characters (i.e áéíóúñ)

I've made a custom analyzer and have added it to a field in my mapping by
doing:

create index

curl -XPUT 'elasticsearch:9200/my_new_index' -d '{
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": ["asciifolding", "lowercase"]
}
}
}
}
}
}'

create mapping

curl -XPUT 'elasticsearch:9200/my_new_index/my_mapping/_mapping' -d '{
"properties": {
"guid": {"type": "string", "index": "no"},
"name": {
"type": "string",
"index": "analyzed",
"analyzer": "my_custom_analyzer"
}
}
}'

I can confirm the analyzer is working fine since I've tried it directly via
the Analyzer API by running:

curl -XGET
'elasticsearch:9200/my_new_index/_analyze?analyzer=my_custom_analyzer' -d
'áéíóú'

{
"tokens" : [ {
"token" : "aeiou",
"start_offset" : 0,
"end_offset" : 5,
"type" : "word",
"position" : 1
} ]
}

But for some reason, the queries on the mapping are not using the analyzer,
namely the following query. If I understand correctly it should match
documents starting with both "aei" and "áéí", but it only matches the latter
case.

curl -XPOST 'elasticsearch:9200/my_new_index/my_mapping/_search?pretty=true'
-d '{
"query": {
"prefix": {
"name" : {
"prefix": "áéí"
}
}
}
}'

What I'm I missing? A full, runnable, example showing the problem can be
found here: https://gist.github.com/ivoscc/6518829

Thanks a lot for your help!

Ivo

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #4