My team is in the process of migrating an application from a custom Lucene
based solution to elasticsearch and we have some questions about the
keyword analyzer. In our "legacy" system we are indexing a field
using org.apache.lucene.analysis.KeywordAnalyzer for searching and
indexing.
According to the documentation: (
)
An analyzer of type keyword that “tokenizes” an entire stream as a single
token. This is useful for data like zip codes, ids and so on. Note, when
using mapping definitions, it make more sense to simply mark the field as
not_analyzed.
What advantage does using not_analyzed have over analyzing it as a keyword?
Is it functionally equivalent?
They are not totally equivalent. With keyword analyzer you can add additional filters, like lowercase one for example and have your text as a single term in the index although analyzed a bit. When setting the field to not_analyzed you'll have the field as it is.
My team is in the process of migrating an application from a custom Lucene based solution to elasticsearch and we have some questions about the keyword analyzer. In our "legacy" system we are indexing a field using org.apache.lucene.analysis.KeywordAnalyzer for searching and indexing.
An analyzer of type keyword that “tokenizes” an entire stream as a single token. This is useful for data like zip codes, ids and so on. Note, when using mapping definitions, it make more sense to simply mark the field as not_analyzed.
What advantage does using not_analyzed have over analyzing it as a keyword? Is it functionally equivalent?
They are not totally equivalent. With keyword analyzer you can add additional filters, like lowercase one for example and have your text as a single term in the index although analyzed a bit. When setting the field to not_analyzed you'll have the field as it is.
My team is in the process of migrating an application from a custom Lucene based solution to elasticsearch and we have some questions about the keyword analyzer. In our "legacy" system we are indexing a field using org.apache.lucene.analysis.KeywordAnalyzer for searching and indexing.
An analyzer of type keyword that “tokenizes” an entire stream as a single token. This is useful for data like zip codes, ids and so on. Note, when using mapping definitions, it make more sense to simply mark the field as not_analyzed.
What advantage does using not_analyzed have over analyzing it as a keyword? Is it functionally equivalent?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.