Hi,
Thanks for the response.
Currently we're indexing a set of documents in different languages and
using _analyzer mapping to determine the per doc stemming analyzer.
What we'd like to do is index some fields of the documents both stemmed and
unstemmed (eg. english analyzer to produce stemmed English and 'standard'
analyzer to produce unstemmed). So using a multi_field seems applicable,
but then the two analyzers are fixed. Kind of need to specify two _analyzer
fields.
Essentially the customer wants to be able to do both stemmed (language
specific) searches and unstemmed (general) searches. This comes down to a
requirement to be able to match names, proper nouns, etc in cases where
stemming may interfere but there's no definitive list of these terms that
should not be stemmed.
We considered an index per language but it's quite a high number of
languages we're dealing so would likely be too many indexes.
Using a field per language also presents issues - to do the general
unstemmed searches would require querying across many fields.
Alternatively we were considering if it'd be easy to develop a tokenizer
that wrapped existing stemming tokenizers but also produced the original
term in addition to the stemmed term.
Sorry if that makes less than perfect sense!
thanks,
Barnaby
On Tuesday, 6 March 2012 20:29:57 UTC, kimchy wrote:
No, you can't specify it per field, though why do you want it? Usually,
having a different analyzer for each document does't make a lot of sense.
Usually, it makes more sense to have different fields.On Tuesday, March 6, 2012 at 6:01 PM, barnybug wrote:
I understand you can specify the analyzer per document at index time
using the _analyzer field in mapping, but is it possible to specify it
in the same way but per field at index time?Or if not currently possible, how easy to add (happy to have a crack
at it myself)?thanks
Barnaby