I know that ElasticSearch has a lot of built in analyzers. Basically i'm looking to perform specific analyzers based upon the language identification of a field. I know that I can use the build in "analyzer" field to specify which analyzer I wish based on a field name.
My initial thought was going to be to use my "language" field to determine which analyzer I want to use. So if the "Language" field is "English", I would want to use the english analyzer.
Which brings me to my point. Instead of re-inventing the wheel and creating a lot of custom analyzers for each language, I would like to use the built-in tokenizers / stop words / etc.. for each language. I cannot find a list of built in analyzers that elasticsearch uses so I can just specify as an example "analyzer: english". I would like to know how what each analyzers stopword list is, etc..
I know that Elasticsearch has a lot of built in analyzers. Basically i'm
looking to perform specific analyzers based upon the language
identification
of a field. I know that I can use the build in "analyzer" field to specify
which analyzer I wish based on a field name.
My initial thought was going to be to use my "language" field to determine
which analyzer I want to use. So if the "Language" field is "English", I
would want to use the english analyzer.
Which brings me to my point. Instead of re-inventing the wheel and creating
a lot of custom analyzers for each language, I would like to use the
built-in tokenizers / stop words / etc.. for each language. I cannot find a
list of built in analyzers that elasticsearch uses so I can just specify as
an example "analyzer: english". I would like to know how what each
analyzers
stopword list is, etc..
I see the stopwords for each language. It seems that they use the Snowball stemmer for each type with the language identifier.
For some fields i'm looking for more precision, and less recall. So I will have to use some custom analyzers for them, but for the others this looks good.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.