Listing Analyzers


(phobos182) #1

I know that ElasticSearch has a lot of built in analyzers. Basically i'm looking to perform specific analyzers based upon the language identification of a field. I know that I can use the build in "analyzer" field to specify which analyzer I wish based on a field name.

My initial thought was going to be to use my "language" field to determine which analyzer I want to use. So if the "Language" field is "English", I would want to use the english analyzer.

Which brings me to my point. Instead of re-inventing the wheel and creating a lot of custom analyzers for each language, I would like to use the built-in tokenizers / stop words / etc.. for each language. I cannot find a list of built in analyzers that elasticsearch uses so I can just specify as an example "analyzer: english". I would like to know how what each analyzers stopword list is, etc..

Any documentation regarding this?

Thanks,


(Paul Loy) #2

http://www.elasticsearch.org/guide/reference/index-modules/analysis/

On Tue, Jun 7, 2011 at 8:57 PM, phobos182 phobos182@gmail.com wrote:

I know that ElasticSearch has a lot of built in analyzers. Basically i'm
looking to perform specific analyzers based upon the language
identification
of a field. I know that I can use the build in "analyzer" field to specify
which analyzer I wish based on a field name.

My initial thought was going to be to use my "language" field to determine
which analyzer I want to use. So if the "Language" field is "English", I
would want to use the english analyzer.

Which brings me to my point. Instead of re-inventing the wheel and creating
a lot of custom analyzers for each language, I would like to use the
built-in tokenizers / stop words / etc.. for each language. I cannot find a
list of built in analyzers that elasticsearch uses so I can just specify as
an example "analyzer: english". I would like to know how what each
analyzers
stopword list is, etc..

Any documentation regarding this?

Thanks,

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Listing-Analyzers-tp3036342p3036342.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy


(phobos182) #3

Thanks. I did not see the "Language" analyzer on the right side.

Any idea what stopwords comprise these analyzers? Any way to look deeper into them to find out how they are constructed?


(Paul Loy) #4

They use the Lucene standard stopwords. Someone on this mailing list posted
a link but I can't find it...

On Tue, Jun 7, 2011 at 9:29 PM, phobos182 phobos182@gmail.com wrote:

Thanks. I did not see the "Language" analyzer on the right side.

Any idea what stopwords comprise these analyzers? Any way to look deeper
into them to find out how they are constructed?

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Listing-Analyzers-tp3036342p3036572.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy


(Paul Loy) #5

here we go, Solr has a good reference:
http://wiki.apache.org/solr/LanguageAnalysis

On Tue, Jun 7, 2011 at 10:13 PM, Paul Loy keteracel@gmail.com wrote:

They use the Lucene standard stopwords. Someone on this mailing list posted
a link but I can't find it...

On Tue, Jun 7, 2011 at 9:29 PM, phobos182 phobos182@gmail.com wrote:

Thanks. I did not see the "Language" analyzer on the right side.

Any idea what stopwords comprise these analyzers? Any way to look deeper
into them to find out how they are constructed?

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Listing-Analyzers-tp3036342p3036572.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy


(phobos182) #6

I see the stopwords for each language. It seems that they use the Snowball stemmer for each type with the language identifier.

For some fields i'm looking for more precision, and less recall. So I will have to use some custom analyzers for them, but for the others this looks good.

Thanks again,


(fashionalwallet) #7
  • deleted -

(system) #8