Analizer with stop words removal by language

Sebastian_Gavarini · November 7, 2010, 5:34am

Hi all,

I am facing an issue with stop words removal for one of my fields. I
have like 10 to 15 fields that are analyzed without the stop filter,
but I have a long field called "description", that needs stop words
removal.
The problem I have is that I need a solution for many languages, I
added in elasticsearch.yml definitions for my analyzers, for example
"default", "en_stop_analyzer" "es_stop_analyzer", ...
(en, es being English and Spanish).
So far so good, I have also some custom mappings explicitly defined,
and the dynamic features off.
The problem is I can't use for my types the "analyzer" setting in the
JSON mapping, because I use the same mapping for all the languages,
let's say I have a mapping for "myDocument", which has a field
"description", that I know must be indexed with stop filter, but I
will only know the language at indexing time.

I could create many mappings, one for each language, but I have 5
different object types already, multiplied by the languages I must
support, it's not nice to maintain for just the stop word list of a
single field.
Is there a way to use at least some "include/import" feature to
minimize the differences among files?
Could the analyzer be passed with the index and bulk apis?
any other ideas?

Thanks,
Sebastian.

kimchy · November 7, 2010, 11:57am

Hi,

Yea, one of the features on my list is to have the ability to drive the
analyzer used based on a field in the json doc. I got most of it implemented
on a local branch, open a feature for it?

-shay.bnaon

On Sun, Nov 7, 2010 at 7:34 AM, Sebastian sgavarini@gmail.com wrote:

Hi all,

I am facing an issue with stop words removal for one of my fields. I
have like 10 to 15 fields that are analyzed without the stop filter,
but I have a long field called "description", that needs stop words
removal.
The problem I have is that I need a solution for many languages, I
added in elasticsearch.yml definitions for my analyzers, for example
"default", "en_stop_analyzer" "es_stop_analyzer", ...
(en, es being English and Spanish).
So far so good, I have also some custom mappings explicitly defined,
and the dynamic features off.
The problem is I can't use for my types the "analyzer" setting in the
JSON mapping, because I use the same mapping for all the languages,
let's say I have a mapping for "myDocument", which has a field
"description", that I know must be indexed with stop filter, but I
will only know the language at indexing time.

I could create many mappings, one for each language, but I have 5
different object types already, multiplied by the languages I must
support, it's not nice to maintain for just the stop word list of a
single field.
Is there a way to use at least some "include/import" feature to
minimize the differences among files?
Could the analyzer be passed with the index and bulk apis?
any other ideas?

Thanks,
Sebastian.

Sebastian_Gavarini · November 7, 2010, 6:28pm

Hi Shay,

I think that is a very good idea.

Sure, I have just opened it: Issues · elastic/elasticsearch · GitHub

I don't want to rush you, but I would like to know more or less when
do you expect that to be implemented? For now I can go with my plan to
create many files, maybe with a template generation.

Thanks,
Sebastian.

On Nov 7, 8:57 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi,

Yea, one of the features on my list is to have the ability to drive the
analyzer used based on a field in the json doc. I got most of it implemented
on a local branch, open a feature for it?

-shay.bnaon

On Sun, Nov 7, 2010 at 7:34 AM, Sebastian sgavar...@gmail.com wrote:

Hi all,

I am facing an issue with stop words removal for one of my fields. I
have like 10 to 15 fields that are analyzed without the stop filter,
but I have a long field called "description", that needs stop words
removal.
The problem I have is that I need a solution for many languages, I
added in elasticsearch.yml definitions for my analyzers, for example
"default", "en_stop_analyzer" "es_stop_analyzer", ...
(en, es being English and Spanish).
So far so good, I have also some custom mappings explicitly defined,
and the dynamic features off.
The problem is I can't use for my types the "analyzer" setting in the
JSON mapping, because I use the same mapping for all the languages,
let's say I have a mapping for "myDocument", which has a field
"description", that I know must be indexed with stop filter, but I
will only know the language at indexing time.

I could create many mappings, one for each language, but I have 5
different object types already, multiplied by the languages I must
support, it's not nice to maintain for just the stop word list of a
single field.
Is there a way to use at least some "include/import" feature to
minimize the differences among files?
Could the analyzer be passed with the index and bulk apis?
any other ideas?

Thanks,
Sebastian.

kimchy · November 7, 2010, 7:02pm

Already implemented and pushed to master:
Issues · elastic/elasticsearch · GitHub.

On Sun, Nov 7, 2010 at 8:28 PM, Sebastian sgavarini@gmail.com wrote:

Hi Shay,

I think that is a very good idea.

Sure, I have just opened it:
Issues · elastic/elasticsearch · GitHub

I don't want to rush you, but I would like to know more or less when
do you expect that to be implemented? For now I can go with my plan to
create many files, maybe with a template generation.

Thanks,
Sebastian.

On Nov 7, 8:57 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi,

Yea, one of the features on my list is to have the ability to drive the
analyzer used based on a field in the json doc. I got most of it
implemented
on a local branch, open a feature for it?

-shay.bnaon

On Sun, Nov 7, 2010 at 7:34 AM, Sebastian sgavar...@gmail.com wrote:

Hi all,

I am facing an issue with stop words removal for one of my fields. I
have like 10 to 15 fields that are analyzed without the stop filter,
but I have a long field called "description", that needs stop words
removal.
The problem I have is that I need a solution for many languages, I
added in elasticsearch.yml definitions for my analyzers, for example
"default", "en_stop_analyzer" "es_stop_analyzer", ...
(en, es being English and Spanish).
So far so good, I have also some custom mappings explicitly defined,
and the dynamic features off.
The problem is I can't use for my types the "analyzer" setting in the
JSON mapping, because I use the same mapping for all the languages,
let's say I have a mapping for "myDocument", which has a field
"description", that I know must be indexed with stop filter, but I
will only know the language at indexing time.

I could create many mappings, one for each language, but I have 5
different object types already, multiplied by the languages I must
support, it's not nice to maintain for just the stop word list of a
single field.
Is there a way to use at least some "include/import" feature to
minimize the differences among files?
Could the analyzer be passed with the index and bulk apis?
any other ideas?

Thanks,
Sebastian.

Sebastian_Gavarini · November 9, 2010, 3:41am

Hi Shay,

I posted an update in issue Mapper: An analyzer mapper allowing to control the index analyzer of a document based on a document field · Issue #485 · elastic/elasticsearch · GitHub

It isn't working for me, I followed the example but the analyzer field
is not found by AnalyzerMapper, line 85.
The document where the mapper tries to find the analyzer field doesn't
contain it yet.

Sebastian.

On Nov 7, 4:02 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Already implemented and pushed to master:Issues · elastic/elasticsearch · GitHub.

On Sun, Nov 7, 2010 at 8:28 PM, Sebastian sgavar...@gmail.com wrote:

Hi Shay,

I think that is a very good idea.

Sure, I have just opened it:
Issues · elastic/elasticsearch · GitHub

I don't want to rush you, but I would like to know more or less when
do you expect that to be implemented? For now I can go with my plan to
create many files, maybe with a template generation.

Thanks,
Sebastian.

On Nov 7, 8:57 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi,

Yea, one of the features on my list is to have the ability to drive the
analyzer used based on a field in the json doc. I got most of it
implemented
on a local branch, open a feature for it?

-shay.bnaon

On Sun, Nov 7, 2010 at 7:34 AM, Sebastian sgavar...@gmail.com wrote:

Hi all,

I am facing an issue with stop words removal for one of my fields. I
have like 10 to 15 fields that are analyzed without the stop filter,
but I have a long field called "description", that needs stop words
removal.
The problem I have is that I need a solution for many languages, I
added in elasticsearch.yml definitions for my analyzers, for example
"default", "en_stop_analyzer" "es_stop_analyzer", ...
(en, es being English and Spanish).
So far so good, I have also some custom mappings explicitly defined,
and the dynamic features off.
The problem is I can't use for my types the "analyzer" setting in the
JSON mapping, because I use the same mapping for all the languages,
let's say I have a mapping for "myDocument", which has a field
"description", that I know must be indexed with stop filter, but I
will only know the language at indexing time.

I could create many mappings, one for each language, but I have 5
different object types already, multiplied by the languages I must
support, it's not nice to maintain for just the stop word list of a
single field.
Is there a way to use at least some "include/import" feature to
minimize the differences among files?
Could the analyzer be passed with the index and bulk apis?
any other ideas?

Thanks,
Sebastian.

Topic		Replies	Views
Trying to remove stopwords from ES-index JAVA API Elasticsearch	8	2145	July 5, 2017
Stopwords in analyzer doesn't seem to work Elasticsearch	3	384	June 26, 2020
Analyser doesn't remove English stopwords Elasticsearch	3	441	June 4, 2018
Elasticsearch Foreign Language Stop-words Elasticsearch	2	490	July 6, 2017
Using English analyzer filtered out some words Elasticsearch	2	323	February 14, 2019

Analizer with stop words removal by language

Related topics