I am facing an issue with stop words removal for one of my fields. I
have like 10 to 15 fields that are analyzed without the stop filter,
but I have a long field called "description", that needs stop words
removal.
The problem I have is that I need a solution for many languages, I
added in elasticsearch.yml definitions for my analyzers, for example
"default", "en_stop_analyzer" "es_stop_analyzer", ...
(en, es being English and Spanish).
So far so good, I have also some custom mappings explicitly defined,
and the dynamic features off.
The problem is I can't use for my types the "analyzer" setting in the
JSON mapping, because I use the same mapping for all the languages,
let's say I have a mapping for "myDocument", which has a field
"description", that I know must be indexed with stop filter, but I
will only know the language at indexing time.
I could create many mappings, one for each language, but I have 5
different object types already, multiplied by the languages I must
support, it's not nice to maintain for just the stop word list of a
single field.
Is there a way to use at least some "include/import" feature to
minimize the differences among files?
Could the analyzer be passed with the index and bulk apis?
any other ideas?
Yea, one of the features on my list is to have the ability to drive the
analyzer used based on a field in the json doc. I got most of it implemented
on a local branch, open a feature for it?
I am facing an issue with stop words removal for one of my fields. I
have like 10 to 15 fields that are analyzed without the stop filter,
but I have a long field called "description", that needs stop words
removal.
The problem I have is that I need a solution for many languages, I
added in elasticsearch.yml definitions for my analyzers, for example
"default", "en_stop_analyzer" "es_stop_analyzer", ...
(en, es being English and Spanish).
So far so good, I have also some custom mappings explicitly defined,
and the dynamic features off.
The problem is I can't use for my types the "analyzer" setting in the
JSON mapping, because I use the same mapping for all the languages,
let's say I have a mapping for "myDocument", which has a field
"description", that I know must be indexed with stop filter, but I
will only know the language at indexing time.
I could create many mappings, one for each language, but I have 5
different object types already, multiplied by the languages I must
support, it's not nice to maintain for just the stop word list of a
single field.
Is there a way to use at least some "include/import" feature to
minimize the differences among files?
Could the analyzer be passed with the index and bulk apis?
any other ideas?
I don't want to rush you, but I would like to know more or less when
do you expect that to be implemented? For now I can go with my plan to
create many files, maybe with a template generation.
Yea, one of the features on my list is to have the ability to drive the
analyzer used based on a field in the json doc. I got most of it implemented
on a local branch, open a feature for it?
I am facing an issue with stop words removal for one of my fields. I
have like 10 to 15 fields that are analyzed without the stop filter,
but I have a long field called "description", that needs stop words
removal.
The problem I have is that I need a solution for many languages, I
added in elasticsearch.yml definitions for my analyzers, for example
"default", "en_stop_analyzer" "es_stop_analyzer", ...
(en, es being English and Spanish).
So far so good, I have also some custom mappings explicitly defined,
and the dynamic features off.
The problem is I can't use for my types the "analyzer" setting in the
JSON mapping, because I use the same mapping for all the languages,
let's say I have a mapping for "myDocument", which has a field
"description", that I know must be indexed with stop filter, but I
will only know the language at indexing time.
I could create many mappings, one for each language, but I have 5
different object types already, multiplied by the languages I must
support, it's not nice to maintain for just the stop word list of a
single field.
Is there a way to use at least some "include/import" feature to
minimize the differences among files?
Could the analyzer be passed with the index and bulk apis?
any other ideas?
I don't want to rush you, but I would like to know more or less when
do you expect that to be implemented? For now I can go with my plan to
create many files, maybe with a template generation.
Yea, one of the features on my list is to have the ability to drive the
analyzer used based on a field in the json doc. I got most of it
implemented
on a local branch, open a feature for it?
I am facing an issue with stop words removal for one of my fields. I
have like 10 to 15 fields that are analyzed without the stop filter,
but I have a long field called "description", that needs stop words
removal.
The problem I have is that I need a solution for many languages, I
added in elasticsearch.yml definitions for my analyzers, for example
"default", "en_stop_analyzer" "es_stop_analyzer", ...
(en, es being English and Spanish).
So far so good, I have also some custom mappings explicitly defined,
and the dynamic features off.
The problem is I can't use for my types the "analyzer" setting in the
JSON mapping, because I use the same mapping for all the languages,
let's say I have a mapping for "myDocument", which has a field
"description", that I know must be indexed with stop filter, but I
will only know the language at indexing time.
I could create many mappings, one for each language, but I have 5
different object types already, multiplied by the languages I must
support, it's not nice to maintain for just the stop word list of a
single field.
Is there a way to use at least some "include/import" feature to
minimize the differences among files?
Could the analyzer be passed with the index and bulk apis?
any other ideas?
It isn't working for me, I followed the example but the analyzer field
is not found by AnalyzerMapper, line 85.
The document where the mapper tries to find the analyzer field doesn't
contain it yet.
I don't want to rush you, but I would like to know more or less when
do you expect that to be implemented? For now I can go with my plan to
create many files, maybe with a template generation.
Yea, one of the features on my list is to have the ability to drive the
analyzer used based on a field in the json doc. I got most of it
implemented
on a local branch, open a feature for it?
I am facing an issue with stop words removal for one of my fields. I
have like 10 to 15 fields that are analyzed without the stop filter,
but I have a long field called "description", that needs stop words
removal.
The problem I have is that I need a solution for many languages, I
added in elasticsearch.yml definitions for my analyzers, for example
"default", "en_stop_analyzer" "es_stop_analyzer", ...
(en, es being English and Spanish).
So far so good, I have also some custom mappings explicitly defined,
and the dynamic features off.
The problem is I can't use for my types the "analyzer" setting in the
JSON mapping, because I use the same mapping for all the languages,
let's say I have a mapping for "myDocument", which has a field
"description", that I know must be indexed with stop filter, but I
will only know the language at indexing time.
I could create many mappings, one for each language, but I have 5
different object types already, multiplied by the languages I must
support, it's not nice to maintain for just the stop word list of a
single field.
Is there a way to use at least some "include/import" feature to
minimize the differences among files?
Could the analyzer be passed with the index and bulk apis?
any other ideas?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.