Term Aggregations and StopWords

Andre_Morais · July 24, 2014, 3:57pm

Hi!

I'm really enjoying all the possibilities brought about by the move from
facets to aggregations. However, I still can't figure out the relationship
between facets or buckets and analyzers. Is it not possible at all to get
the buckets out of an analyzed field?

Specifically, I need to get list of most common words, but I want to use my
stopword list to exclude those that do not matter to me.

I am using a stop word filter:

      index.analysis.filter.fnstop:
        type: stop
        stopwords: ["my", "it", "the", "likes"]

And a custom analyzer:

      index.analysis.analyzer.test:
          type: custom
          tokenizer: whitespace
          filter: lowercase, asciifolding, fnstop

I then map my field with the custom analyzer:
...
"Clean_Message" : {{"type" : "string", "analyzer" : "test"}

And request list of top 100 most common terms, using the search API:
{
"query": { "bool": { "must": [ { "match_all": {} } ] } },
"aggs": {
"Message": {
"terms": {
"field": "Clean_Message",
"size": 100,
"order": { "_count": "desc" }
}
}
}
}

However, some words in my stop filter appear in that list.

Is it by design? Are we not supposed to run facets or aggregations agains
an analyzed field?

Is it possible to get the list of most common terms against an analyzed
field?

Thank you very much for your attention and for your work!

   André Morais

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0b9f0874-6f79-46d6-8e9b-5393b0b3cd10%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Andre_Morais · September 17, 2014, 3:58pm

Hello! Still can't get the result I want: stop words not appearing in
buckets.

Further testing showed that:

if I filter aggregation with a query for one of the stop words, I get an
empty result for aggregations;
the same analyzer is changing all and and replacing them with
SMILE and FROWN, these appear as such in the aggregation results;
if I include all the stop words using the "exclude" option, it works;

So it appears that my analyzer is doing everything it should, except
filtering the stop words when getting the aggregations (it works for
search).

And I am beginning to wonder if this could be, in fact, a bug... Any
thoughts?

Thanks,

      André Morais

Quinta-feira, 24 de Julho de 2014 16:57:59 UTC+1, André Morais escreveu:

Hi!

I'm really enjoying all the possibilities brought about by the move from
facets to aggregations. However, I still can't figure out the relationship
between facets or buckets and analyzers. Is it not possible at all to get
the buckets out of an analyzed field?

Specifically, I need to get list of most common words, but I want to use
my stopword list to exclude those that do not matter to me.

I am using a stop word filter:
      index.analysis.filter.fnstop:
        type: stop
        stopwords: ["my", "it", "the", "likes"]
And a custom analyzer:
      index.analysis.analyzer.test:
          type: custom
          tokenizer: whitespace
          filter: lowercase, asciifolding, fnstop
I then map my field with the custom analyzer:
...
"Clean_Message" : {{"type" : "string", "analyzer" : "test"}

And request list of top 100 most common terms, using the search API:
{
"query": { "bool": { "must": [ { "match_all": {} } ] } },
"aggs": {
"Message": {
"terms": {
"field": "Clean_Message",
"size": 100,
"order": { "_count": "desc" }
}
}
}
}

However, some words in my stop filter appear in that list.

Is it by design? Are we not supposed to run facets or aggregations agains
an analyzed field?

Is it possible to get the list of most common terms against an analyzed
field?

Thank you very much for your attention and for your work!
   André Morais

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/79bd7b05-ba26-4f26-a817-e3c34061a325%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Stopwords in term aggregation Elasticsearch	7	1138	July 5, 2017
Common Terms query and aggregations Elasticsearch	1	332	July 6, 2017
Elasticsearch remove stopwords from storing in index and aggregation Elasticsearch	1	586	September 11, 2017
Analyser doesn't remove English stopwords Elasticsearch	3	442	June 4, 2018
Stop words not used by the analyzer Elasticsearch	5	614	July 6, 2017

Term Aggregations and StopWords

Related topics