Now, I would like to treat all the documents with the same job_name and build_number as the same document. So basically when I create visualization or run a query, I would like it to run on this document (although it doesn't exists):
How can I achieve that? I've seen topics like "top hit aggregations", "collapsed fields", ... but at this point I'm quite confused as to what is the right approach so eventually any user that would run a query or create a visualization, will not have to think about it but the system will be configured to treat documents with the same fields values as one document.
This can be achieved by grouping docs using job_name and build_number and than aggregate the docs with a scripted_metric. A perfect fit for that is a composite aggregation. job_name and build_number are the sources, the scripted_metric that merges the fields your aggregation.
However, you want to visualize on top of that, so I think you want to have the merged documents in an index. That's where transform comes into play. It basically runs a composite aggregation, takes the results and writes the output into a new index. You can set it up to run continuously, so it will consume your new incoming data (you need a timestamp for that, but I guess you have that). In transform job_name and build_number would be your group_by.
Regarding the scripted_metric aggregation, please have a look here. The examples are not transform specific, but work in aggregations.
Here is a more specific example, I found:
"scripted_metric": {
"init_script": "state.join = new HashMap()",
"map_script": "String[] fields = new String[] {'name', 'best_friend', 'hobby'}; for (e in fields) { if (doc.containsKey(e)) {state.join.put(e, doc[e])}}",
"combine_script": "return state.join",
"reduce_script": "String[] fields = new String[] {'name', 'best_friend', 'hobby'}; Map j=new HashMap(); for (s in states) {for (e in fields) { if (s.containsKey(e)) {j.put(e, s[e].get(0))}}} return j;"
}
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.