Case 1:
I have 1000 rows, where multiple rows can have the same value for the
email field.
{"email":"some@email.com","points":5}
{"email":"some@email.com","points":2}
...
How do I tell elasticsearch to search for all emails that have only
appeared once in the data set.
Case 2:
Also using aggregation. How can I tell elasticsearch to get all possible
occurrences the emails appeared in the data set.
ex.
emails = 5, occourances >= 5 // There are 5 emails that appeared 4 times in
the dataset
emails = 6, occourances = 4
emails = 23, occourances = 3
emails = 2, occourances = 2
emails = 12, occourances = 1
The first one is not available, however a terms aggregation and sort by
_count asc will bubble up the least frequent terms (emails) and you can
filter yourself which ones you want. The second one sounds like a simple
terms aggregation on the email field (just make sure the email field is
not_analyzed):
How do I tell elasticsearch to search for all emails that have only
appeared once in the data set.
Case 2:
Also using aggregation. How can I tell elasticsearch to get all possible
occurrences the emails appeared in the data set.
ex.
emails = 5, occourances >= 5 // There are 5 emails that appeared 5 times
or greater in the dataset
emails = 6, occourances = 4
emails = 23, occourances = 3
emails = 2, occourances = 2
emails = 12, occourances = 1
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.