Configured custom analyzer registered but not used while indexing

Hi everyone,

I'm facing a curious problem.

I configured a custom analyzer this way in my settings :

{
"index":{
"cluster.name":"test-cluster",
"client.transport.sniff":true,
"analysis":{
"filter":{
"french_elision":{
"type":"elision",
"articles":[
* ...skipped...*
]
},
"french_stop":{
"type":"stop",
"stopwords":"french",
"ignore_case":true
},
"snowball":{
"type":"snowball",
"language":"french"
}
},
"analyzer":{

  • "my_french":{ "type":"custom",
    "tokenizer":"standard", "filter":[
    "french_elision", "lowercase",
    "french_stop", "snowball" ] },*
    "lower_analyzer":{
    "type":"custom",
    "tokenizer":"keyword",
    "filter":"lowercase"
    },
    "token_analyzer":{
    "type":"custom",
    "tokenizer":"whitespace"
    }
    }
    }
    }
    }

My mapping declares the custom analyzer as the global analyzer for the type
'record', and explicitly for the 'a' field of my records this way :

{
"record":{
"_all":{
"enabled":false
},
* "analyzer":"my_french",*
"properties":{
"_uuid":{
"type":"string",
"store":"yes",
"index":"not_analyzed"
},
"a":{
"type":"multi_field",
"fields":{
"a":{
"type":"string",
"store":"yes",
"index":"analyzed",
*"analyzer":"my_french" *
},
"raw":{
"type":"string",
"store":"no",
"index":"not_analyzed"
},
"tokens":{
"type":"string",
"store":"no",
"index":"analyzed",
"analyzer":"token_analyzer"
},
"lower":{
"type":"string",
"store":"no",
"index":"analyzed",
"analyzer":"lower_analyzer"
}
}
},
"g_r":{
"type":"string",
"store":"yes",
"index":"analyzed"
}
}
}
}

So here basically, i expect to see fields a and g_r to be analysed
using my_french analyzer:

  • a because it is explicitly defined in the field mapping;
  • g_r because no analyzer is defined in the field mapping, but the global
    analyzer is defined to my_french.

And actually if i test the analysis process using a _analyze REST request,
it seems ok :

$ curl -XGET 'localhost:9200/test-index/_analyze?analyzer=my_french' -d "j'aime
les chevaux
"
{
"tokens":[
{
"token":"aim",
"start_offset":0,
"end_offset":6,
"type":"",
"position":1
},
{
"token":"cheval",
"start_offset":11,
"end_offset":18,
"type":"",
"position":3
}
]
}

Which is definitely what i expect of my my_french analyzer.

But when i index my data and query on it, i don't get the expected results.
So i tried executing a facet query to see what terms have been indexed for
my fields, and the result is very surprising :

Query :

{
"query": {
"match": {
"_id": "12"
}
},
"facets": {
"tokens": {
"terms": {
"field": "a"
}
}
}
}

This gives me the following result, which is not what i expected to see (i
expect the tokens to be returned to be aim and cheval, as resulting
from the analysis request above) :

$ curl -X POST "http://localhost:9200/test-index/_search?pretty=true" -d
'{"query": {"match": {"_id": "12"}},"facets": {"tokens": {"terms":
{"field": "a"}}}}'

{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "test-index",
"_type" : "record",
"_id" : "12",
"_score" : 1.0,
"_source":{"_uuid":"12","a_t":false,"a_n":false,"a":"J'aime les
chevaux","b_r":null,"b_t":false,"b_n":false,"b":1407664800000,"c_r":null,"c_t":false,"c_n":false,"c":2,"d_r":"m3","d_t":true,"d_n":false,"d":null,"e_r":null,"e_t":false,"e_n":true,"e":12,"f_r":null,"f_t":false,"f_n":false,"f":true,"g_r":"J'aime
les chevaux","g_t":false,"g_n":false,"g":12.0}
} ]
},
"facets" : {
"tokens" : {
"_type" : "terms",
"missing" : 0,
"total" : 2,
"other" : 0,
"terms" : [ {
"term" : "j'aim",
"count" : 1
}, {
"term" : "cheval",
"count" : 1
} ]
}
}
}

Can anyone see what is wrong, where i made a mistake, or what i am missing ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9e11d3ef-b291-44d8-a08a-3d7f5740badb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.