How to change the stop words list for the TwitterRiver index


(benq) #1

Hi,

I am using the TwitterRiver with 0.19.

How can I change the stopwords list ?

From the documentation, I should be able to provide a custom word list
either at index creation, which is not convenient because the index is
automatically created by the TwitterRiver (and I would prefer not to
touch this code) or in my yaml.xml. Is there a third possibility, like
updating the index definition after its creation?

I tried to configure the standard analyser to use my external
stopwords file by adding:
index.analysis.analyser.standard.stopwords_path:
"extendedstopwords_en.txt"
without success.

How should I do that?

benq


(Igor Motov) #2

You can configure the standard analyzer globally by adding the following
settings to elasticsearch.yml

index:
analysis:
analyzer:
default:
type: standard
stopwords_path: "extendedstopwords_en.txt"

You will have to restart the node for settings to take effect after you
change node's config file.

On Sunday, March 4, 2012 9:36:16 AM UTC-5, benq wrote:

Hi,

I am using the TwitterRiver with 0.19.

How can I change the stopwords list ?

From the documentation, I should be able to provide a custom word list
either at index creation, which is not convenient because the index is
automatically created by the TwitterRiver (and I would prefer not to
touch this code) or in my yaml.xml. Is there a third possibility, like
updating the index definition after its creation?

I tried to configure the standard analyser to use my external
stopwords file by adding:
index.analysis.analyser.standard.stopwords_path:
"extendedstopwords_en.txt"
without success.

How should I do that?

benq


(benq) #3

How can I update the stop work list without restarting ES (and interrupting
the indexing)?
Is that possible?

Le dimanche 4 mars 2012 17:24:04 UTC+1, Igor Motov a écrit :

You can configure the standard analyzer globally by adding the following
settings to elasticsearch.yml

index:
analysis:
analyzer:
default:
type: standard
stopwords_path: "extendedstopwords_en.txt"

You will have to restart the node for settings to take effect after you
change node's config file.

On Sunday, March 4, 2012 9:36:16 AM UTC-5, benq wrote:

Hi,

I am using the TwitterRiver with 0.19.

How can I change the stopwords list ?

From the documentation, I should be able to provide a custom word list
either at index creation, which is not convenient because the index is
automatically created by the TwitterRiver (and I would prefer not to
touch this code) or in my yaml.xml. Is there a third possibility, like
updating the index definition after its creation?

I tried to configure the standard analyser to use my external
stopwords file by adding:
index.analysis.analyser.standard.stopwords_path:
"extendedstopwords_en.txt"
without success.

How should I do that?

benq


(Igor Motov) #4

You can update the stop word list without restarting ES by closing and
opening the index. This will, however, interrupt the indexing.
Alternatively, you can try rolling restart of entire cluster, but this
might lead to inconsistency between nodes on the top of
expected inconsistencies that would result from changing the analyzer
without reindexing.

On Tuesday, April 24, 2012 8:17:20 AM UTC-4, benq wrote:

How can I update the stop work list without restarting ES (and
interrupting the indexing)?
Is that possible?

Le dimanche 4 mars 2012 17:24:04 UTC+1, Igor Motov a écrit :

You can configure the standard analyzer globally by adding the following
settings to elasticsearch.yml

index:
analysis:
analyzer:
default:
type: standard
stopwords_path: "extendedstopwords_en.txt"

You will have to restart the node for settings to take effect after you
change node's config file.

On Sunday, March 4, 2012 9:36:16 AM UTC-5, benq wrote:

Hi,

I am using the TwitterRiver with 0.19.

How can I change the stopwords list ?

From the documentation, I should be able to provide a custom word list
either at index creation, which is not convenient because the index is
automatically created by the TwitterRiver (and I would prefer not to
touch this code) or in my yaml.xml. Is there a third possibility, like
updating the index definition after its creation?

I tried to configure the standard analyser to use my external
stopwords file by adding:
index.analysis.analyser.standard.stopwords_path:
"extendedstopwords_en.txt"
without success.

How should I do that?

benq


(benq) #5

So, there is no way to do an "hot update" of the stop words list?!

It would be great to have the stopwords list stored as an ES document...

Le mardi 24 avril 2012 15:31:11 UTC+2, Igor Motov a écrit :

You can update the stop word list without restarting ES by closing and
opening the index. This will, however, interrupt the indexing.
Alternatively, you can try rolling restart of entire cluster, but this
might lead to inconsistency between nodes on the top of
expected inconsistencies that would result from changing the analyzer
without reindexing.

On Tuesday, April 24, 2012 8:17:20 AM UTC-4, benq wrote:

How can I update the stop work list without restarting ES (and
interrupting the indexing)?
Is that possible?

Le dimanche 4 mars 2012 17:24:04 UTC+1, Igor Motov a écrit :

You can configure the standard analyzer globally by adding the following
settings to elasticsearch.yml

index:
analysis:
analyzer:
default:
type: standard
stopwords_path: "extendedstopwords_en.txt"

You will have to restart the node for settings to take effect after you
change node's config file.

On Sunday, March 4, 2012 9:36:16 AM UTC-5, benq wrote:

Hi,

I am using the TwitterRiver with 0.19.

How can I change the stopwords list ?

From the documentation, I should be able to provide a custom word list
either at index creation, which is not convenient because the index is
automatically created by the TwitterRiver (and I would prefer not to
touch this code) or in my yaml.xml. Is there a third possibility, like
updating the index definition after its creation?

I tried to configure the standard analyser to use my external
stopwords file by adding:
index.analysis.analyser.standard.stopwords_path:
"extendedstopwords_en.txt"
without success.

How should I do that?

benq


(system) #6