Merging documents based on matched fields values

abregman · September 24, 2020, 2:47am

I'll start by saying that I'm not sure what I need is actually to merge documents, it's simply the most suitable subject I've found.

Every document created in my environment today has the following fields: job_name and build_number so documents look like this:

{ 'job_name': x,
  'build_numer': 1
  'extra_field_one': ...
}

{ 'job_name': x,
  'build_numer': 1
  'extra_field_two': ...
}

{ 'job_name': x,
  'build_numer': 2
  'extra_field_one': ...
}

Now, I would like to treat all the documents with the same job_name and build_number as the same document. So basically when I create visualization or run a query, I would like it to run on this document (although it doesn't exists):

{ 'job_name': x,
  'build_numer': 1
  'extra_field_one': ...
  'extra_field_two': ...
}

How can I achieve that? I've seen topics like "top hit aggregations", "collapsed fields", ... but at this point I'm quite confused as to what is the right approach so eventually any user that would run a query or create a visualization, will not have to think about it but the system will be configured to treat documents with the same fields values as one document.

Hendrik_Muhs · September 24, 2020, 6:24am

This can be achieved by grouping docs using job_name and build_number and than aggregate the docs with a scripted_metric. A perfect fit for that is a composite aggregation. job_name and build_number are the sources, the scripted_metric that merges the fields your aggregation.

However, you want to visualize on top of that, so I think you want to have the merged documents in an index. That's where transform comes into play. It basically runs a composite aggregation, takes the results and writes the output into a new index. You can set it up to run continuously, so it will consume your new incoming data (you need a timestamp for that, but I guess you have that). In transform job_name and build_number would be your group_by.

Regarding the scripted_metric aggregation, please have a look here. The examples are not transform specific, but work in aggregations.

Here is a more specific example, I found:

"scripted_metric": {
          "init_script": "state.join = new HashMap()",
          "map_script": "String[] fields = new String[] {'name', 'best_friend', 'hobby'}; for (e in fields) { if (doc.containsKey(e)) {state.join.put(e, doc[e])}}",
          "combine_script": "return state.join",
          "reduce_script": "String[] fields = new String[] {'name', 'best_friend', 'hobby'}; Map j=new HashMap(); for (s in states) {for (e in fields) { if (s.containsKey(e)) {j.put(e, s[e].get(0))}}} return j;"
        }

system · October 22, 2020, 6:24am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Merge fields between documents Elasticsearch eql-elastic-query-language	3	642	May 27, 2022
Last Valid Value of a Field in Scripted Fields Elasticsearch	6	672	November 2, 2020
Merge Documents based on field value? Elasticsearch	2	6102	September 25, 2017
Combine multiple document into one document with limited Fields ( merging of documents ) Elasticsearch	9	5340	June 19, 2020
Accessing fields in last_doc scripted metric aggregation in a transform Elasticsearch painless , transforms	6	354	December 19, 2023

Merging documents based on matched fields values

Related topics