I am trying to use stopwords, synonym and stemming analyzers at index time but would like to avoid query time analyzer altogether (nothing more than a standard analyzer would do). Synonyms are defined in expanded form like eat,gulp,swallow
Here is my ideal setup: default analzyer which consists of stopwords, synonyms and stemming filters default_search analyzer which is of type standard
stopwords and synonyms work fine, however stemming does not.
The problem with the above approach is the original (uninflected) information is lost after the stemming filter before indexing.
Is there any way/configuration to achieve my ideal setup (i.e. skipping stemming analyzer at query time)?
One alternative I thought of is to specify all stemming rules as synonyms in expanded form, but not sure if it is the best way to achieve it.
you can use multi fields to index the same field in different ways, once with stemming applied, once with the standard analyzer. I think this could be more helpful in your case.
Correct me if I am wrong, here is my understanding of using multi fields:
Indexing a field with multiple analyzers is accomplished via mapping. The problem with mapping is that the index-time analyzer is also used during query time.
Furthermore, let's assume the following index-time-only stemmer_override:
ran=>run
runs=>run
running=>run
With the multi-field approach (assuming I am specifying standard analyzer in the query itself), if I search for run, I'd get back the results for all the variations. However, if I search for any original term, e.g. ran, wouldn't it skip runs and running?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.