I am performing an terms aggregation on a query to return the unique values
of a field, in this case the field being emails in the format example@gmail.com or example@aaa.example.com.
"aggregations": {
"users_overall": {
"terms": {
"field": "email"
}
}
}
Instead of receiving back the full unique emails that I need I get results
such as these:
I've run into this problem with other fields that have values with
special characters in them as well, is there any way to perform an
aggregation like this that can ignore the special characters and return the
full value?
Aggregations work on the tokens for the specified field. These tokens are
generated when a tokenizer is applied to a field. In your case, you do not
want the field to be tokenized at all, so you would either need to define
is as not_analyzed or use a keyword tokenizer, which does not separate
tokens.
I am performing an terms aggregation on a query to return the unique
values of a field, in this case the field being emails in the format example@gmail.com or example@aaa.example.com.
"aggregations": {
"users_overall": {
"terms": {
"field": "email"
}
}
}
Instead of receiving back the full unique emails that I need I get results
such as these:
I've run into this problem with other fields that have values with
special characters in them as well, is there any way to perform an
aggregation like this that can ignore the special characters and return the
full value?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.