Merge search results into single document?

Hi,

I have data the looks like this:

{
"name":"John"
"limit":"100"
}

{
"name":"John"
"Spent":150"
}

{
"name":"Ray"
"limit":"200"
}

{
"name":"Ray"
"Spent":250"
}

But those are four separate documents. Is it possible to merge documents where the name is the same into a single document to create something like this?

{
"name":"John"
"limit":"100"
"Spent":150"
}

I want to pipe the output to Logstash and perform an action based on a comparison between limit and spent so the data will have to be in the same document.

There is a brand new feature in Elasticsearch (introduced in version 7.2) called "Data frame transforms" that will allow you to do this. With data frame transforms you can write the results of an aggregation to another index. In your case you could aggregate the values of limit and Spent, and group the results by name.

The docs are quite extensive, with some great examples, but let me walk you through an example.

Let's say you had indexed your data like this:

PUT my_index
{
  "mappings": {
    "properties": {
      "name": {
        "type": "keyword"
      },
      "limit": {
        "type": "long"
      },
      "Spent": {
        "type": "long"
      }
    }
  }
}

PUT my_index/_doc/1
{
  "name": "John",
  "limit": 100
}

PUT my_index/_doc/2
{
  "name": "John",
  "Spent": 150
}

PUT my_index/_doc/3
{
  "name": "Ray",
  "limit": 200
}

PUT my_index/_doc/4
{
  "name": "Ray",
  "Spent": 250
}

You can define a transform that groups your original data by name and writes the results to a new index my_index2. The values of limit and Spent will be the sum of their original values:

PUT _data_frame/transforms/my_transform
{
  "source": {
    "index": "my_index"
  },
  "pivot": {
    "group_by": {
      "name": {
        "terms": {
          "field": "name"
        }
      }
    },
    "aggregations": {
      "limit": {
        "sum": {
          "field": "limit"
        }
      },
      "Spent": {
        "sum": {
          "field": "Spent"
        }
      }
    }
  },
  "dest": {
    "index": "my_index2"
  }
}

Next, you can start the transform :

POST _data_frame/transforms/my_transform/_start

You can now check whether the data in my_index2 is in your desired format:

GET my_index2/_search

Resulting in:

    "hits" : [
      {
        "_index" : "my_index2",
        "_type" : "_doc",
        "_id" : "SpghpfNeEBZG2jy3kqf61UMAAAAAAAAA",
        "_score" : 1.0,
        "_source" : {
          "name" : "John",
          "limit" : 100.0,
          "Spent" : 150.0
        }
      },
      {
        "_index" : "my_index2",
        "_type" : "_doc",
        "_id" : "UtFxVzAGuj71aOPpxpuKJWsAAAAAAAAA",
        "_score" : 1.0,
        "_source" : {
          "name" : "Ray",
          "limit" : 200.0,
          "Spent" : 250.0
        }
      }
    ]
2 Likes

@abdon thank you for the very detailed explanation. Looks like that might work, I will try it out.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.