Question about asciifolding filter


(Andrei) #1

I've added a custom analyzer that uses asciifolding filter as follows:

index:
analysis:
analyzer:
eulang:
type: custom
tokenizer: standard
filter: [standard, lowercase, asciifolding, stop]

I then created a new index with the following mapping (so that title,
notes, and tags fields use the eulang analyzer):

{
"place" : {
"_all" : {enabled: false},
"properties" : {
"user_id" : {"type" : "integer", "index" :
"not_analyzed"},
"title" : {"type" : "string", "boost" : 1.5, "analyzer" :
"eulang"},
"notes" : {"type" : "string", "analyzer" : "eulang"},
"tags" : {"type" : "string", "index_name" : "tag",
"boost" : 1.5, "analyzer" : "eulang"},
"created_on" : {"type" : "date", "format" : "YYYY-MM-DD
HH:mm:ss"}
}
}
}'

After inserting a few documents and querying I see that the ASCII
folding works at index time, but not at query time for some reason:

$ curl 'http://localhost:9200/places_2010091901/_search?
q=notes:café' {"_shards":{"total":1,"successful":1,"failed":0},"hits":
{"total":0,"max_score":null,"hits":[]}}

$ curl 'http://localhost:9200/places_2010091901/_search?q=notes:cafe'
{"_shards":{"total":1,"successful":1,"failed":0},"hits":{"total":
1,"max_score":0.5,"hits":
[{"_index":"places_2010091901","_type":"place","_id":"1","_score":0.5,
"_source" : {"notes": "Café and event space. À côté.", "title":
"Watershed", "user_id": 11, "created_on": "2010-09-19 11:01:19",
"tags": ["coffee", "wifi", "view"]}}]}}

I was under the impression that the asciifolding filter would be
applied to the query string as well..


(Andrei) #2

Seems that it works fine for the query_string query, not the term one
I was doing.

-Andrei

On Sep 19, 4:38 pm, Andrei and...@zmievski.org wrote:

I've added a custom analyzer that uses asciifolding filter as follows:

index:
analysis:
analyzer:
eulang:
type: custom
tokenizer: standard
filter: [standard, lowercase, asciifolding, stop]

I then created a new index with the following mapping (so that title,
notes, and tags fields use the eulang analyzer):

{
"place" : {
"_all" : {enabled: false},
"properties" : {
"user_id" : {"type" : "integer", "index" :
"not_analyzed"},
"title" : {"type" : "string", "boost" : 1.5, "analyzer" :
"eulang"},
"notes" : {"type" : "string", "analyzer" : "eulang"},
"tags" : {"type" : "string", "index_name" : "tag",
"boost" : 1.5, "analyzer" : "eulang"},
"created_on" : {"type" : "date", "format" : "YYYY-MM-DD
HH:mm:ss"}
}
}

}'

After inserting a few documents and querying I see that the ASCII
folding works at index time, but not at query time for some reason:

$ curl 'http://localhost:9200/places_2010091901/_search?
q=notes:café' {"_shards":{"total":1,"successful":1,"failed":0},"hits":
{"total":0,"max_score":null,"hits":[]}}

$ curl 'http://localhost:9200/places_2010091901/_search?q=notes:cafe'
{"_shards":{"total":1,"successful":1,"failed":0},"hits":{"total":
1,"max_score":0.5,"hits":
[{"_index":"places_2010091901","_type":"place","_id":"1","_score":0.5,
"_source" : {"notes": "Café and event space. À côté.", "title":
"Watershed", "user_id": 11, "created_on": "2010-09-19 11:01:19",
"tags": ["coffee", "wifi", "view"]}}]}}

I was under the impression that the asciifolding filter would be
applied to the query string as well..


(Shay Banon) #3

Hey,

When you say term query, is that a query DSL using the term query, or the
first example with the _search?q=... ? The latter gets translated into a
query_string query. When using the term query itself, then no analysis
happens on the provided term value, when using the query_string or field
queries, then analysis does happen based on the analyzer associated with the
queries field.

-shay.banon

On Thu, Sep 23, 2010 at 12:22 AM, Andrei andrei@zmievski.org wrote:

Seems that it works fine for the query_string query, not the term one
I was doing.

-Andrei

On Sep 19, 4:38 pm, Andrei and...@zmievski.org wrote:

I've added a custom analyzer that uses asciifolding filter as follows:

index:
analysis:
analyzer:
eulang:
type: custom
tokenizer: standard
filter: [standard, lowercase, asciifolding, stop]

I then created a new index with the following mapping (so that title,
notes, and tags fields use the eulang analyzer):

{
"place" : {
"_all" : {enabled: false},
"properties" : {
"user_id" : {"type" : "integer", "index" :
"not_analyzed"},
"title" : {"type" : "string", "boost" : 1.5, "analyzer" :
"eulang"},
"notes" : {"type" : "string", "analyzer" : "eulang"},
"tags" : {"type" : "string", "index_name" : "tag",
"boost" : 1.5, "analyzer" : "eulang"},
"created_on" : {"type" : "date", "format" : "YYYY-MM-DD
HH:mm:ss"}
}
}

}'

After inserting a few documents and querying I see that the ASCII
folding works at index time, but not at query time for some reason:

$ curl 'http://localhost:9200/places_2010091901/_search?
q=notes:café' {"_shards":{"total":1,"successful":1,"failed":0},"hits":
{"total":0,"max_score":null,"hits":[]}}

$ curl 'http://localhost:9200/places_2010091901/_search?q=notes:cafe'
{"_shards":{"total":1,"successful":1,"failed":0},"hits":{"total":
1,"max_score":0.5,"hits":
[{"_index":"places_2010091901","_type":"place","_id":"1","_score":0.5,
"_source" : {"notes": "Café and event space. À côté.", "title":
"Watershed", "user_id": 11, "created_on": "2010-09-19 11:01:19",
"tags": ["coffee", "wifi", "view"]}}]}}

I was under the impression that the asciifolding filter would be
applied to the query string as well..


(system) #4