Default analyzer when the given analyzer not found?


(Han-2) #1

I have an index with analyzer settings as shown below..

{
"settings": {
"analysis": {
"analyzer": {
"default": { "type": "english" },
"ar": { "type": "arabic" },
"hy":{ "type": "armenian" },
...
}
}
}
}

and a type mapping as shown below

{
"type1" : {
"_analyzer" : {
"path" : "language"
},
"properties" : {
"id" : { "type" : "string", "index" : "not_analyzed" },
"name" : { "type" : "string" },
"language" : { "type" : "string", "index" : "not_analyzed" }
}
}
}

Language can be any language and it might not have a valid anlyzer mapping
too... so my question is "is there anyway we can specify settings such that
ElasticSearch uses 'default' analyzer when there is no matching analyzer
found?" currently i am getting "No analyzer found" error message..

I could actually list out all of the languages and define an analyzer for
each of them but in our case the language list keep changing... it would
nice to have to have a default analyzer when there is NO matching analyzer.

I would really appreciate any suggestions.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Igor Motov) #2

That sounds like a useful feature. I would suggest creating an issue for
it.

On Monday, March 25, 2013 1:48:32 PM UTC-4, Han wrote:

I have an index with analyzer settings as shown below..

{
"settings": {
"analysis": {
"analyzer": {
"default": { "type": "english" },
"ar": { "type": "arabic" },
"hy":{ "type": "armenian" },
...
}
}
}
}

and a type mapping as shown below

{
"type1" : {
"_analyzer" : {
"path" : "language"
},
"properties" : {
"id" : { "type" : "string", "index" : "not_analyzed" },
"name" : { "type" : "string" },
"language" : { "type" : "string", "index" : "not_analyzed" }
}
}
}

Language can be any language and it might not have a valid anlyzer mapping
too... so my question is "is there anyway we can specify settings such that
ElasticSearch uses 'default' analyzer when there is no matching analyzer
found?" currently i am getting "No analyzer found" error message..

I could actually list out all of the languages and define an analyzer for
each of them but in our case the language list keep changing... it would
nice to have to have a default analyzer when there is NO matching analyzer.

I would really appreciate any suggestions.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Frederic Meyer) #3

Hey there.

Nearly one year after this initial post, I'm running into the exact same
issue, even though ES is now released (1.0).

Has anybody found a proper solution within ES? I've spent like 1 hour
searching for this, without any luck.

The only ugly workaround that I can think of right now is deal with a fall
back language at the data level i.e. before sending documents to be indexed
by ES.

Thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ee9e7c0d-8022-4c35-bdde-4e194be1da98%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Brian Yoder) #4

Based on posts to this newsgroup early on in my usage of ES (over a year
now!), I used to put the following in my elasticsearch.yml file. Any field
that was not explicitly assigned an analyzer and that was deemed by ES to
be a string would pick up English snowball analyzer with no stop words (my
preference at the time):

index:
analysis:
analyzer:
# set stemming analyzer with no stop words as the default
default:
type: snowball
language: English
stopwords: none
filter:
stopWordsFilter:
type: stop
stopwords: none

But since then, I've long abandoned this default approach. Instead, I
explicitly assigned an analyzer to each and every field (you know, like a
real database!). And then my elasticsearch.yml file now contains the
following:

Do not automatically create an index when a document is loaded, and do

not automatically index unknown (unmapped) fields:

action.auto_create_index: false
index.mapper.dynamic: false

Therefore, I cannot automatically create an index during a load (which
would then create a useless index without any of the analyzers and mappings
I've carefully crafted). And I cannot get ES to automatically create a new
field; this is very helpful when someone uses a low-level tool such as
curl, and misspells a field name; ES will no longer create, for example,
the givveName field when it should have been givenName.

Brian

On Tuesday, February 25, 2014 8:57:30 AM UTC-5, Frederic Meyer wrote:

Hey there.

Nearly one year after this initial post, I'm running into the exact same
issue, even though ES is now released (1.0).

Has anybody found a proper solution within ES? I've spent like 1 hour
searching for this, without any luck.

The only ugly workaround that I can think of right now is deal with a fall
back language at the data level i.e. before sending documents to be indexed
by ES.

Thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2f1dbdc3-299a-46fa-855f-a34c74497c43%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Frederic Meyer) #5

Ah yes, via the default in the yaml configuration file, of course. I'll
give that a try, thanks!

It is a pity though that the "default" analyzer doesn't seem to do his job
of processing all unmatched document as far as the _analyze field is
concerned.

Thanks
Fred

P.S. : I do understand your position about not indexing documents for which
you haven't craft a dedicated analyzer yet. Makes real sense.

On Tuesday, February 25, 2014 5:09:43 PM UTC+1, InquiringMind wrote:

Based on posts to this newsgroup early on in my usage of ES (over a year
now!), I used to put the following in my elasticsearch.yml file. Any field
that was not explicitly assigned an analyzer and that was deemed by ES to
be a string would pick up English snowball analyzer with no stop words (my
preference at the time):

index:
analysis:
analyzer:
# set stemming analyzer with no stop words as the default
default:
type: snowball
language: English
stopwords: none
filter:
stopWordsFilter:
type: stop
stopwords: none

But since then, I've long abandoned this default approach. Instead, I
explicitly assigned an analyzer to each and every field (you know, like a
real database!). And then my elasticsearch.yml file now contains the
following:

Do not automatically create an index when a document is loaded, and do

not automatically index unknown (unmapped) fields:

action.auto_create_index: false
index.mapper.dynamic: false

Therefore, I cannot automatically create an index during a load (which
would then create a useless index without any of the analyzers and mappings
I've carefully crafted). And I cannot get ES to automatically create a new
field; this is very helpful when someone uses a low-level tool such as
curl, and misspells a field name; ES will no longer create, for example,
the givveName field when it should have been givenName.

Brian

On Tuesday, February 25, 2014 8:57:30 AM UTC-5, Frederic Meyer wrote:

Hey there.

Nearly one year after this initial post, I'm running into the exact same
issue, even though ES is now released (1.0).

Has anybody found a proper solution within ES? I've spent like 1 hour
searching for this, without any luck.

The only ugly workaround that I can think of right now is deal with a
fall back language at the data level i.e. before sending documents to be
indexed by ES.

Thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a0fdb30b-d63a-4679-899a-36b45c788d8d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #6