Listing Analyzers

phobos182 · June 7, 2011, 7:57pm

I know that ElasticSearch has a lot of built in analyzers. Basically i'm looking to perform specific analyzers based upon the language identification of a field. I know that I can use the build in "analyzer" field to specify which analyzer I wish based on a field name.

My initial thought was going to be to use my "language" field to determine which analyzer I want to use. So if the "Language" field is "English", I would want to use the english analyzer.

Which brings me to my point. Instead of re-inventing the wheel and creating a lot of custom analyzers for each language, I would like to use the built-in tokenizers / stop words / etc.. for each language. I cannot find a list of built in analyzers that elasticsearch uses so I can just specify as an example "analyzer: english". I would like to know how what each analyzers stopword list is, etc..

Any documentation regarding this?

Thanks,

Paul_Loy · June 7, 2011, 7:59pm

On Tue, Jun 7, 2011 at 8:57 PM, phobos182 phobos182@gmail.com wrote:

I know that Elasticsearch has a lot of built in analyzers. Basically i'm
looking to perform specific analyzers based upon the language
identification
of a field. I know that I can use the build in "analyzer" field to specify
which analyzer I wish based on a field name.

My initial thought was going to be to use my "language" field to determine
which analyzer I want to use. So if the "Language" field is "English", I
would want to use the english analyzer.

Which brings me to my point. Instead of re-inventing the wheel and creating
a lot of custom analyzers for each language, I would like to use the
built-in tokenizers / stop words / etc.. for each language. I cannot find a
list of built in analyzers that elasticsearch uses so I can just specify as
an example "analyzer: english". I would like to know how what each
analyzers
stopword list is, etc..

Any documentation regarding this?

Thanks,

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Listing-Analyzers-tp3036342p3036342.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

phobos182 · June 7, 2011, 8:29pm

Thanks. I did not see the "Language" analyzer on the right side.

Any idea what stopwords comprise these analyzers? Any way to look deeper into them to find out how they are constructed?

Paul_Loy · June 7, 2011, 9:13pm

They use the Lucene standard stopwords. Someone on this mailing list posted
a link but I can't find it...

On Tue, Jun 7, 2011 at 9:29 PM, phobos182 phobos182@gmail.com wrote:

Thanks. I did not see the "Language" analyzer on the right side.

Any idea what stopwords comprise these analyzers? Any way to look deeper
into them to find out how they are constructed?

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Listing-Analyzers-tp3036342p3036572.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

Paul_Loy · June 7, 2011, 9:14pm

here we go, Solr has a good reference:
http://wiki.apache.org/solr/LanguageAnalysis

On Tue, Jun 7, 2011 at 10:13 PM, Paul Loy keteracel@gmail.com wrote:

They use the Lucene standard stopwords. Someone on this mailing list posted
a link but I can't find it...

On Tue, Jun 7, 2011 at 9:29 PM, phobos182 phobos182@gmail.com wrote:

Thanks. I did not see the "Language" analyzer on the right side.

Any idea what stopwords comprise these analyzers? Any way to look deeper
into them to find out how they are constructed?

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Listing-Analyzers-tp3036342p3036572.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

phobos182 · June 7, 2011, 11:48pm

I see the stopwords for each language. It seems that they use the Snowball stemmer for each type with the language identifier.

For some fields i'm looking for more precision, and less recall. So I will have to use some custom analyzers for them, but for the others this looks good.

Thanks again,

fashionalwallet · June 10, 2011, 12:30am

deleted -

Topic		Replies	Views
Language analyzer Elasticsearch	2	328	July 6, 2017
Using differents analysers based on the document language Elasticsearch	2	343	July 6, 2017
Analizer with stop words removal by language Elasticsearch	5	505	July 6, 2017
Supporting as many languages as possible Elasticsearch	1	367	July 6, 2017
Using a different analyzer for each query and same index Elasticsearch	3	400	July 6, 2017

Listing Analyzers

--

--

--

--

Related topics