Internal index process merging records into arrays of objects based on a parent common key

Hi,

I have a doubt about a feature. I'm trying to run an internal process in Elastic where I create superseed objects based on an index that contains more flat and granular objects. For example:

The source index contains objects as:

[
      {
             "person_id": "2736548Z",
             "comment": "hi 1",
             "timestamp": "2023-02-24T12:15:00"
      } ,
      {
             "person_id": "2736548Z",
             "comment": "hi 2",
             "timestamp": "2023-02-24T12:20:00"
      } 
      {
             "person_id": "1278882S",
             "comment": "hi a",
             "timestamp": "2023-02-24T12:25:00"
      } 
]

I want to create the next superseed object in a destination index based on the person_id as aggregatable entity key:

[
        {
               "person_id": "2736548Z",
               "comments": [
                     {
                         "comment": "hi 1",
                         "timestamp": "2023-02-24T12:15:00"
                     },
                     {
                         "comment": "hi 2",
                         "timestamp": "2023-02-24T12:20:00"
                     }
                ]
        } ,
        {
               "person_id": "2736548Z",
               "comments": [
                     {
                         "comment": "hi a",
                         "timestamp": "2023-02-24T12:25:00"
                     },
                ]
        } 
]

Is there any internal process as pipeline, processor, transformer that I can use to generate that superseed object based on a particular key as parent of the object array?

Thanks in advance!

Could you provide a bit more detail about your use case? Generally speaking, the docs in the form you have them in currently are typically ideal for Elasticsearch. Switching to a nested field type seems somewhat counterintuitive as it would most likely hurt search performance.

Hi Ben, thanks for that.

Yes, the source format is needed for one search case but the second one it's for a display case using a different search case where I don't have an interest to search on the nested object, I just want to display it searching through the more flatten keys. The models differs because the search cases have different intentions but don't want to do it before indexing as will hammer my Python scripts.

Mainly want to know if I can build that as an internal pipeline to build those superseed objects. Thanks!

Are the array elements in the 1st data example individual docs?

If so, you can combine those docs and create a so-called entity centric view - person_id would be the entity - with Transforms. Transforms can run continuously on ingested data. With other words, you would have both indices and can use one or the other dependent on the use case.

However, to collapse the comments there is no built-in aggregation that does that for you, but you need a scripted_metric to do this job, e.g.

  "scripted_metric": {
    "init_script": "state.docs = []",
    "map_script": "state.docs.add(new HashMap(params['_source']))",
    "combine_script": "return state.docs",
    "reduce_script": "def docs = []; for (s in states) {for (d in s) { docs.add(d);}}return docs"
  }

combines all sources from the input docs. The only missing part here is to drop the redundant person_id from the hash map.

1 Like

Hi Hendrik,

Amazing, exactly what I need. Yes, in the first array each JSON is one isolated object in the indice. Going to be trying to adapt your solution to my real use case and let you know if any doubt. Highly appreciate it.

Regards

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.