Listing available analyzers via API


(Kevin Blaisdell) #1

The scenario I have is driving some index builds from an external
application. As part of this an analyzer would be chosen in the external
application. The intent here would be that a choice could be made from a
list of all analyzers available in the ES installation whether distributed
with ES or custom configured by someone on that particular installation.

To my surprise there doesn't seem to be a way via API to get a listing of
the available analyzers?

I do understand I can look at the source or the documentation to see the
default ones, but I am looking for a programatic, run time way to find what
is available. Am I missing something in the API or will I have to look at
creating a plugin to add this capability?

Kevin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cd781459-16ec-4319-be29-52c3b9979e31%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Ivan Brusic) #2

I do not think there is anything in the API, but if you are running Java,
you can create a IndicesAnalysisService locally (see the unit tests for an
example) and then call analyzerProviderFactories() to get the various
prebuilt analyzer factories. This only works for the standard analyzers,
not any analyzers installed via plugins. For those, you would need to use
the AnalysisService around an existing index (not as clean).

An API wrapper around the IndicesAnalysisService could make sense.

--
Ivan

On Wed, Mar 19, 2014 at 9:46 AM, Kevin B blaisdellk@gmail.com wrote:

The scenario I have is driving some index builds from an external
application. As part of this an analyzer would be chosen in the external
application. The intent here would be that a choice could be made from a
list of all analyzers available in the ES installation whether distributed
with ES or custom configured by someone on that particular installation.

To my surprise there doesn't seem to be a way via API to get a listing of
the available analyzers?

I do understand I can look at the source or the documentation to see the
default ones, but I am looking for a programatic, run time way to find what
is available. Am I missing something in the API or will I have to look at
creating a plugin to add this capability?

Kevin

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/cd781459-16ec-4319-be29-52c3b9979e31%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/cd781459-16ec-4319-be29-52c3b9979e31%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQD7Dac80%3DggxCJPVrBjA1yf5XcGYydCtnBYuatr6hK7ug%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Brian Yoder) #3

Kevin.

Do you mean the index settings query API? For example, I issued the
following query against my sgen (schema generation, an experimental index
for exploring Elasticsearch mapping nooks and crannies):

$ curl -XGET 'http://localhost:9200/sgen/_settings?pretty=true' && echo

And it gives me the following, which makes it easy (at least by looking at
the JSON and my experience with parsing JSON) to deterministically list all
of the custom analyzers that are defined:

{
"sgen" : {
"settings" : {
"index" : {
"uuid" : "ecznJNykSl-DguwyN6SIZg",
"number_of_replicas" : "0",
"analysis" : {
"char_filter" : {
"finnish_char_mapper" : {
"type" : "mapping",
"mappings" : [ "Å=>O", "å=>o", "W=>V", "w=>v" ]
}
},
"analyzer" : {
"english_standard_analyzer" : {
"type" : "custom",
"filter" : [ "standard", "lowercase", "asciifolding" ],
"tokenizer" : "standard"
},
"finnish_stemming_analyzer" : {
"type" : "custom",
"char_filter" : [ "finnish_char_mapper" ],
"filter" : [ "standard", "lowercase",
"finnish_snowball_filter" ],
"tokenizer" : "standard"
},
"english_stemming_analyzer" : {
"type" : "custom",
"filter" : [ "standard", "lowercase", "asciifolding",
"english_snowball_filter" ],
"tokenizer" : "standard"
},
"english_stemming_stop_analyzer" : {
"type" : "custom",
"filter" : [ "standard", "lowercase", "asciifolding",
"english_stop_filter", "english_snowball_filter" ],
"tokenizer" : "standard"
},
"russian_stemming_analyzer" : {
"type" : "custom",
"filter" : [ "standard", "lowercase",
"russian_snowball_filter" ],
"tokenizer" : "standard"
},
"arabic_stemming_Arabic_analyzer" : {
"type" : "custom",
"filter" : [ "standard", "lowercase",
"Arabic_stemming_filter" ],
"tokenizer" : "standard"
}
},
"filter" : {
"finnish_snowball_filter" : {
"type" : "snowball",
"language" : "Finnish"
},
"english_stop_filter" : {
"type" : "stop",
"language" : [ "english" ]
},
"english_snowball_filter" : {
"type" : "snowball",
"language" : "English"
},
"russian_snowball_filter" : {
"type" : "snowball",
"language" : "Russian"
},
"Arabic_stemming_filter" : {
"type" : "stemmer",
"name" : "Arabic"
}
}
},
"number_of_shards" : "1",
"refresh_interval" : "2s",
"version" : {
"created" : "1000099"
}
}
}
}
}

Since in my case I never use a built-in analyzer, I don't need to query the
field mappings to find them, nor do I need to depend on ES to dynamically
detect the "correct" mapping (I turn all that stuff off). So the bottom
line is: Disabling auto-index creation and auto-type mapping of new fields
makes the system more robust, and has the wonderful side effect of making
discovery of the schema easy and deterministic!

But for the available built-in analyzers, the Java side might be possible
by listing the analyzer classes in the org.elasticsearch.index.analysispackage. It would involve a bit of Java reflection to automate this, but
perhaps it could be done.

Hope this helps!

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8f28a827-9627-4e6d-95af-d3ee78ca83c7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #4