Internal index process merging records into arrays of objects based on a parent common key

MartinGarcia · February 24, 2023, 2:57pm

Hi,

I have a doubt about a feature. I'm trying to run an internal process in Elastic where I create superseed objects based on an index that contains more flat and granular objects. For example:

The source index contains objects as:

[
      {
             "person_id": "2736548Z",
             "comment": "hi 1",
             "timestamp": "2023-02-24T12:15:00"
      } ,
      {
             "person_id": "2736548Z",
             "comment": "hi 2",
             "timestamp": "2023-02-24T12:20:00"
      } 
      {
             "person_id": "1278882S",
             "comment": "hi a",
             "timestamp": "2023-02-24T12:25:00"
      } 
]

I want to create the next superseed object in a destination index based on the person_id as aggregatable entity key:

[
        {
               "person_id": "2736548Z",
               "comments": [
                     {
                         "comment": "hi 1",
                         "timestamp": "2023-02-24T12:15:00"
                     },
                     {
                         "comment": "hi 2",
                         "timestamp": "2023-02-24T12:20:00"
                     }
                ]
        } ,
        {
               "person_id": "2736548Z",
               "comments": [
                     {
                         "comment": "hi a",
                         "timestamp": "2023-02-24T12:25:00"
                     },
                ]
        } 
]

Is there any internal process as pipeline, processor, transformer that I can use to generate that superseed object based on a particular key as parent of the object array?

Thanks in advance!

BenB196 · February 25, 2023, 2:22pm

Could you provide a bit more detail about your use case? Generally speaking, the docs in the form you have them in currently are typically ideal for Elasticsearch. Switching to a nested field type seems somewhat counterintuitive as it would most likely hurt search performance.

MartinGarcia · February 25, 2023, 4:43pm

Hi Ben, thanks for that.

Yes, the source format is needed for one search case but the second one it's for a display case using a different search case where I don't have an interest to search on the nested object, I just want to display it searching through the more flatten keys. The models differs because the search cases have different intentions but don't want to do it before indexing as will hammer my Python scripts.

Mainly want to know if I can build that as an internal pipeline to build those superseed objects. Thanks!

Hendrik_Muhs · February 27, 2023, 10:33am

Are the array elements in the 1st data example individual docs?

If so, you can combine those docs and create a so-called entity centric view - person_id would be the entity - with Transforms. Transforms can run continuously on ingested data. With other words, you would have both indices and can use one or the other dependent on the use case.

However, to collapse the comments there is no built-in aggregation that does that for you, but you need a scripted_metric to do this job, e.g.

  "scripted_metric": {
    "init_script": "state.docs = []",
    "map_script": "state.docs.add(new HashMap(params['_source']))",
    "combine_script": "return state.docs",
    "reduce_script": "def docs = []; for (s in states) {for (d in s) { docs.add(d);}}return docs"
  }

combines all sources from the input docs. The only missing part here is to drop the redundant person_id from the hash map.

MartinGarcia · February 27, 2023, 10:59am

Hi Hendrik,

Amazing, exactly what I need. Yes, in the first array each JSON is one isolated object in the indice. Going to be trying to adapt your solution to my real use case and let you know if any doubt. Highly appreciate it.

Regards

system · March 27, 2023, 11:00am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Concatenating array objects in elasticsearch transform aggregations Elasticsearch	3	2819	December 23, 2021
Pipeline processor to convert nested array to documents while reindexing Elasticsearch	6	548	February 17, 2021
Visualization with conditional aggregation Kibana	7	434	December 2, 2019
How to merge Array while reindexing data? Elasticsearch	2	530	September 9, 2020
Enrich processor: enrich multiple documents into one array Elasticsearch	2	1063	May 1, 2020

Internal index process merging records into arrays of objects based on a parent common key

Related topics