Stopwords file format

Eugene_Strokin · December 23, 2011, 2:42am

I want to specify my own stop-words. This is what I found so far:
http://www.elasticsearch.org/guide/reference/index-modules/analysis/stop-tokenfilter.html
In elasticsearch.yml I'd have such analyzer specified:

index :
analysis:
analyzer:
string_lowercase:
tokenizer : keyword
filter : lowercase
stopwords_path : stopwords.txt
ignore_case : true

How should I specify the stop-words in the stopwords.txt file? Just a
word in a line, or somehow else?

Also, I don't care which language users will use to index data, so if
I'd put stopwords from different languages into the same file, it
should be no problem, but should I use just UTF-8 encoding, or should
I use encoding like we use in .properties files, e.q. "de art
\u00edculos"?

Thank you,
Eugene S.

kimchy · December 25, 2011, 4:35pm

Each stop word should be in its own "line" (separated by \n). The file is
read in UTF8 format.

On Fri, Dec 23, 2011 at 4:42 AM, Eugene Strokin eugene@strokin.info wrote:

I want to specify my own stop-words. This is what I found so far:

Elasticsearch Platform — Find real-time answers at scale | Elastic
In elasticsearch.yml I'd have such analyzer specified:

index :
analysis:
analyzer:
string_lowercase:
tokenizer : keyword
filter : lowercase
stopwords_path : stopwords.txt
ignore_case : true

How should I specify the stop-words in the stopwords.txt file? Just a
word in a line, or somehow else?

Also, I don't care which language users will use to index data, so if
I'd put stopwords from different languages into the same file, it
should be no problem, but should I use just UTF-8 encoding, or should
I use encoding like we use in .properties files, e.q. "de art
\u00edculos"?

Thank you,
Eugene S.

Topic		Replies	Views
Stopwords in analyzer doesn't seem to work Elasticsearch	3	384	June 26, 2020
Elasticsearch Foreign Language Stop-words Elasticsearch	2	490	July 6, 2017
Stopword syntax with custom analyzer Elasticsearch	2	452	July 6, 2017
Stopwords(Elasticsearch7.14+ fscrawler2.7) Elasticsearch	7	445	October 16, 2021
Stop word filter problem Elasticsearch	5	383	July 6, 2017

Stopwords file format

Related topics