I already tried my luck on StackOverflow but unfortunately without any result. I struggle with some basics and I don't know how to solve the following problem.
I have given the following data structure in Elasticsearch:
How to answer, for instance, this question: Return all email addresses which occurs more than X times. This seems very trivial but for instance how to join 'sender_email' and 'receiver_email' so that we have just one column? How to apply a 'where' clause where I can set a threshold?
How would the solution look like in Kibana and how on the console with curl?
To concatenate columns you will have to use a scripted field in Kibana. In the latest 4.6 release of Kibana, we only support numerical expressions (so no String concatenation), but starting from the 5.0 beta version (currently still in alpha), you will be able to use Painless (a scripting language similar to Groovy) or Groovy.
e.g.: in 5.0, you will be able to store an expression like this as a scripted field:
thank you for your fast reply. The title of the post is misleading maybe a little bit. What I actually really try to do are some basic statistics. I want to figure out, how many times a specific email address is used (for billing reasons for instance). Here, give me all email addresses which were used at least X times ... but because I have 2 columns (just with different meanings) I have to take both columns into account (that's why the title 'joining columns').
Actually, you can continue using scripted fields for that. E.g., after you created one in Kibana, you can use the Datatable visualization to perform such an aggregation. Specifically, you would select the scripted field (the concatenation) from the drop down, and then aggregate with the count.
However, at its core, this isn't a Kibana issue. While Kibana provides you with a wizard to create such a scripted field, you can do this with the search API from Elasticsearch directly using the "Aggregations" feature. The aggregations are used to do things like counts, averages, ... on a field (or in your case, a concatenation of two fields). See https://www.elastic.co/guide/en/elasticsearch/reference/2.3/search-aggregations-metrics-valuecount-aggregation.html#_script_10 for some examples, including one that uses a script.
Starting from the example from that link, in your case, it probably would look something like:
I'd like to share my solution of the above mentioned problem. The solution is to put both email addresses into one additional field. For instance like that:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.