Merge search results into single document?

Sjaak01 · August 22, 2019, 5:54am

Hi,

I have data the looks like this:

{
"name":"John"
"limit":"100"
}

{
"name":"John"
"Spent":150"
}

{
"name":"Ray"
"limit":"200"
}

{
"name":"Ray"
"Spent":250"
}

But those are four separate documents. Is it possible to merge documents where the name is the same into a single document to create something like this?

{
"name":"John"
"limit":"100"
"Spent":150"
}

I want to pipe the output to Logstash and perform an action based on a comparison between limit and spent so the data will have to be in the same document.

abdon · August 22, 2019, 2:46pm

There is a brand new feature in Elasticsearch (introduced in version 7.2) called "Data frame transforms" that will allow you to do this. With data frame transforms you can write the results of an aggregation to another index. In your case you could aggregate the values of limit and Spent, and group the results by name.

The docs are quite extensive, with some great examples, but let me walk you through an example.

Let's say you had indexed your data like this:

PUT my_index
{
  "mappings": {
    "properties": {
      "name": {
        "type": "keyword"
      },
      "limit": {
        "type": "long"
      },
      "Spent": {
        "type": "long"
      }
    }
  }
}

PUT my_index/_doc/1
{
  "name": "John",
  "limit": 100
}

PUT my_index/_doc/2
{
  "name": "John",
  "Spent": 150
}

PUT my_index/_doc/3
{
  "name": "Ray",
  "limit": 200
}

PUT my_index/_doc/4
{
  "name": "Ray",
  "Spent": 250
}

You can define a transform that groups your original data by name and writes the results to a new index my_index2. The values of limit and Spent will be the sum of their original values:

PUT _data_frame/transforms/my_transform
{
  "source": {
    "index": "my_index"
  },
  "pivot": {
    "group_by": {
      "name": {
        "terms": {
          "field": "name"
        }
      }
    },
    "aggregations": {
      "limit": {
        "sum": {
          "field": "limit"
        }
      },
      "Spent": {
        "sum": {
          "field": "Spent"
        }
      }
    }
  },
  "dest": {
    "index": "my_index2"
  }
}

Next, you can start the transform :

POST _data_frame/transforms/my_transform/_start

You can now check whether the data in my_index2 is in your desired format:

GET my_index2/_search

Resulting in:

    "hits" : [
      {
        "_index" : "my_index2",
        "_type" : "_doc",
        "_id" : "SpghpfNeEBZG2jy3kqf61UMAAAAAAAAA",
        "_score" : 1.0,
        "_source" : {
          "name" : "John",
          "limit" : 100.0,
          "Spent" : 150.0
        }
      },
      {
        "_index" : "my_index2",
        "_type" : "_doc",
        "_id" : "UtFxVzAGuj71aOPpxpuKJWsAAAAAAAAA",
        "_score" : 1.0,
        "_source" : {
          "name" : "Ray",
          "limit" : 200.0,
          "Spent" : 250.0
        }
      }
    ]

Sjaak01 · August 23, 2019, 5:29am

@abdon thank you for the very detailed explanation. Looks like that might work, I will try it out.

system · September 20, 2019, 5:36am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Merging multiple logstash output document into one by a common field Logstash	1	435	January 20, 2019
Merging search results by key Elasticsearch	5	441	April 12, 2018
How to join 2 index using 2 common values Elasticsearch	4	941	June 23, 2020
Is it possible to combine two docs in different indices? Elasticsearch transforms	2	942	July 13, 2021
Merging two documents from existing ES indices into a new document Logstash	4	495	December 6, 2019

Merge search results into single document?

Related topics