Hello,
I was unable to reproduce the exact issue I am getting in the system I
am investigating.
But a similar problem seems to exist in in my sample also.
The index with the name "type-test" is created like this:
{
"settings": {
"number_of_shards": 1,
"index": {
"analysis": {
"analyzer": {
"snowball_eng": {
"type": "snowball",
"language": "English"
},
"snowball_fra": {
"type": "snowball",
"language": "French"
}
}
}
}
},
"mappings": {
"type_eng": {
"_source": {
"enabled": false
},
"properties": {
"field1": {
"type": "string",
"analyzer": "snowball_eng"
}
}
},
"type_fra": {
"_source": {
"enabled": false
},
"properties": {
"field1": {
"type": "string",
"analyzer": "snowball_fra"
}
}
}
}
}
Please note that there is one field called "field1" which is used in the
two types type_eng and type_fra.
In each type is a different snowball analyzer configured.
I put some content there:
Put content:
... /type-test/type_eng/1
{
"field1": "Fina"
}
.../type-test/type_fra/2
{
"field1": "Fina"
}
The word Fina is stemmed like this [1]:
English: Fina -> Fina
French: Fina-> Fin
Searching:
../type-test/_search
{
"query": {
"field": {
"field1": "Fina"
}
}
}
That just returns the document with id 2.
To me it looks like using types to have different analyzers for
different languages is not
supported. Is that right, or was something done wrong?
Thanks for your help!
Jörn
[1] Python NLTK Stemming and Lemmatization Demo
On 07/09/2012 07:45 PM, Jörg Prante wrote:
Probably I misunderstood, it's hard to follow without a real example.
Can you provide a gist with a little demo where the varying results
can be reproduced?
Best regards,
Jörg
On Monday, July 9, 2012 6:33:26 PM UTC+2, Jörn wrote:
On 07/09/2012 06:17 PM, Jörg Prante wrote:
> in Lucene/Elasticsearch, if you search a field, the words are
analyzed
> once at search time without notice of the analyzer used at index
time,
> so it's obvious you get varying number of hits searching over many
> types where different analyzers have been used.
Thanks for your answer.
Can you explain that to me a bit further? If the analyzers do not
match
I would still expect to get the same response
from ES again if I re-send the same query. Tough the result might
not be
the desired one but it should be reproduce-able.
Does not the mapping of field to an analyzer take care of choosing
the
right analyzer during index and search time?
Jörn