Aggregations and special characters


(Lucas Rutledge) #1

I am performing an terms aggregation on a query to return the unique values
of a field, in this case the field being emails in the format
example@gmail.com or example@aaa.example.com.

"aggregations": {
"users_overall": {
"terms": {
"field": "email"
}
}
}
Instead of receiving back the full unique emails that I need I get results
such as these:

  • {
  • {
  • {
  • {
  • {
  • {
  • {
  • I've run into this problem with other fields that have values with
    special characters in them as well, is there any way to perform an
    aggregation like this that can ignore the special characters and return the
    full value?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a0a6773c-3360-44fa-9acf-62a7ac05149e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Ivan Brusic) #2

Aggregations work on the tokens for the specified field. These tokens are
generated when a tokenizer is applied to a field. In your case, you do not
want the field to be tokenized at all, so you would either need to define
is as not_analyzed or use a keyword tokenizer, which does not separate
tokens.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-tokenizers.html
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/analysis-intro.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#string

Cheers,

Ivan

On Wed, Jul 9, 2014 at 10:08 AM, Lucas Rutledge lrutledge1@gmail.com
wrote:

I am performing an terms aggregation on a query to return the unique
values of a field, in this case the field being emails in the format
example@gmail.com or example@aaa.example.com.

"aggregations": {
"users_overall": {
"terms": {
"field": "email"
}
}
}
Instead of receiving back the full unique emails that I need I get results
such as these:

  • {
  • {
  • {
  • {
  • {
  • {
  • {
  • I've run into this problem with other fields that have values with
    special characters in them as well, is there any way to perform an
    aggregation like this that can ignore the special characters and return the
    full value?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a0a6773c-3360-44fa-9acf-62a7ac05149e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a0a6773c-3360-44fa-9acf-62a7ac05149e%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBcBj0MWgtuxK1P3gdp9_9HHBXi33%3D%2BqTrOtQx5q%3Dk5ag%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3