Merging documents based on matched fields values

I'll start by saying that I'm not sure what I need is actually to merge documents, it's simply the most suitable subject I've found.

Every document created in my environment today has the following fields: job_name and build_number so documents look like this:

{ 'job_name': x,
  'build_numer': 1
  'extra_field_one': ...
}

{ 'job_name': x,
  'build_numer': 1
  'extra_field_two': ...
}

{ 'job_name': x,
  'build_numer': 2
  'extra_field_one': ...
}

Now, I would like to treat all the documents with the same job_name and build_number as the same document. So basically when I create visualization or run a query, I would like it to run on this document (although it doesn't exists):

{ 'job_name': x,
  'build_numer': 1
  'extra_field_one': ...
  'extra_field_two': ...
}

How can I achieve that? I've seen topics like "top hit aggregations", "collapsed fields", ... but at this point I'm quite confused as to what is the right approach so eventually any user that would run a query or create a visualization, will not have to think about it but the system will be configured to treat documents with the same fields values as one document.

This can be achieved by grouping docs using job_name and build_number and than aggregate the docs with a scripted_metric. A perfect fit for that is a composite aggregation. job_name and build_number are the sources, the scripted_metric that merges the fields your aggregation.

However, you want to visualize on top of that, so I think you want to have the merged documents in an index. That's where transform comes into play. It basically runs a composite aggregation, takes the results and writes the output into a new index. You can set it up to run continuously, so it will consume your new incoming data (you need a timestamp for that, but I guess you have that). In transform job_name and build_number would be your group_by.

Regarding the scripted_metric aggregation, please have a look here. The examples are not transform specific, but work in aggregations.

Here is a more specific example, I found:

"scripted_metric": {
          "init_script": "state.join = new HashMap()",
          "map_script": "String[] fields = new String[] {'name', 'best_friend', 'hobby'}; for (e in fields) { if (doc.containsKey(e)) {state.join.put(e, doc[e])}}",
          "combine_script": "return state.join",
          "reduce_script": "String[] fields = new String[] {'name', 'best_friend', 'hobby'}; Map j=new HashMap(); for (s in states) {for (e in fields) { if (s.containsKey(e)) {j.put(e, s[e].get(0))}}} return j;"
        }
1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.