Term Aggregations and StopWords

Hi!

I'm really enjoying all the possibilities brought about by the move from
facets to aggregations. However, I still can't figure out the relationship
between facets or buckets and analyzers. Is it not possible at all to get
the buckets out of an analyzed field?

Specifically, I need to get list of most common words, but I want to use my
stopword list to exclude those that do not matter to me.

I am using a stop word filter:

      index.analysis.filter.fnstop:
        type: stop
        stopwords: ["my", "it", "the", "likes"]

And a custom analyzer:

      index.analysis.analyzer.test:
          type: custom
          tokenizer: whitespace
          filter: lowercase, asciifolding, fnstop

I then map my field with the custom analyzer:
...
"Clean_Message" : {{"type" : "string", "analyzer" : "test"}

And request list of top 100 most common terms, using the search API:
{
"query": { "bool": { "must": [ { "match_all": {} } ] } },
"aggs": {
"Message": {
"terms": {
"field": "Clean_Message",
"size": 100,
"order": { "_count": "desc" }
}
}
}
}

However, some words in my stop filter appear in that list.

Is it by design? Are we not supposed to run facets or aggregations agains
an analyzed field?

Is it possible to get the list of most common terms against an analyzed
field?

Thank you very much for your attention and for your work!

   André Morais

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0b9f0874-6f79-46d6-8e9b-5393b0b3cd10%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hello! Still can't get the result I want: stop words not appearing in
buckets.

Further testing showed that:

  • if I filter aggregation with a query for one of the stop words, I get an
    empty result for aggregations;
  • the same analyzer is changing all :slight_smile: and :frowning: and replacing them with
    SMILE and FROWN, these appear as such in the aggregation results;
  • if I include all the stop words using the "exclude" option, it works;

So it appears that my analyzer is doing everything it should, except
filtering the stop words when getting the aggregations (it works for
search).

And I am beginning to wonder if this could be, in fact, a bug... Any
thoughts?

Thanks,

      André Morais

Quinta-feira, 24 de Julho de 2014 16:57:59 UTC+1, André Morais escreveu:

Hi!

I'm really enjoying all the possibilities brought about by the move from
facets to aggregations. However, I still can't figure out the relationship
between facets or buckets and analyzers. Is it not possible at all to get
the buckets out of an analyzed field?

Specifically, I need to get list of most common words, but I want to use
my stopword list to exclude those that do not matter to me.

I am using a stop word filter:

      index.analysis.filter.fnstop:
        type: stop
        stopwords: ["my", "it", "the", "likes"]

And a custom analyzer:

      index.analysis.analyzer.test:
          type: custom
          tokenizer: whitespace
          filter: lowercase, asciifolding, fnstop

I then map my field with the custom analyzer:
...
"Clean_Message" : {{"type" : "string", "analyzer" : "test"}

And request list of top 100 most common terms, using the search API:
{
"query": { "bool": { "must": [ { "match_all": {} } ] } },
"aggs": {
"Message": {
"terms": {
"field": "Clean_Message",
"size": 100,
"order": { "_count": "desc" }
}
}
}
}

However, some words in my stop filter appear in that list.

Is it by design? Are we not supposed to run facets or aggregations agains
an analyzed field?

Is it possible to get the list of most common terms against an analyzed
field?

Thank you very much for your attention and for your work!

   André Morais

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/79bd7b05-ba26-4f26-a817-e3c34061a325%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.