How can I save this aggregated results under a new index

Jingyi_Wang · April 12, 2024, 2:26pm

I run the aggregation script in elastic dev tool:

GET copper_scan_username_index,fibre_scan_username_index/_search
{
  "size": 0,
  "aggs": {
    "unique_username": {
      "terms": {
        "field": "username",
        "min_doc_count": 2
      },
      "aggs": {
        "top_events": {
          "top_hits": {
            "size": 10
          }
        }
      }
    }
  }
}

Context of this script: it is to find out the overlap username between the two indexes by looking at if document with a username appeared on both side, then return those documents.

Got the results:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "unique_username" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "username1",
          "doc_count" : 2,
          "top_events" : {
            "hits" : {
              "total" : {
                "value" : 2,
                "relation" : "eq"
              },
              "max_score" : 1.0,
              "hits" : [
                {
                  "_index" : "copper_scan_username_index",
                  "_type" : "_doc",
                  "_id" : "dTXSY0KNxU08jKDnZcc-nbkAAAAAAAAA",
                  "_score" : 1.0,
                  "_source" : {
                    "count" : 3,
                    "copper_scan_count" : 3,
                    "username" : "username1"
                  }
                },
                {
                  "_index" : "fibre_scan_username_index",
                  "_type" : "_doc",
                  "_id" : "dTXSY0KNxU08jKDnZcc-nbkAAAAAAAAA",
                  "_score" : 1.0,
                  "_source" : {
                    "fibre_scan_count" : 1,
                    "count" : 1,
                    "username" : "username1"
                  }
                }
              ]
            }
          }
        }
      ]
    }
  }
}

This is what I want to save under a new index:

[
                {
                  "_index" : "copper_scan_username_index",
                  "_type" : "_doc",
                  "_id" : "dTXSY0KNxU08jKDnZcc-nbkAAAAAAAAA",
                  "_score" : 1.0,
                  "_source" : {
                    "count" : 3,
                    "copper_scan_count" : 3,
                    "username" : "username1"
                  }
                },
                {
                  "_index" : "fibre_scan_username_index",
                  "_type" : "_doc",
                  "_id" : "dTXSY0KNxU08jKDnZcc-nbkAAAAAAAAA",
                  "_score" : 1.0,
                  "_source" : {
                    "fibre_scan_count" : 1,
                    "count" : 1,
                    "username" : "username1"
                  }
                }
]

I tried reindex, but the result saved under the new index is not what is expected.
This is the reindex script:

POST /_reindex
{
  "source": {
    "index": ["copper_scan_username_index","fibre_scan_username_index"],
    "aggs": {
      "username_count": {
        "terms": {
          "field": "username",
          "min_doc_count": 2
        },
        "aggs": {
          "docs": {
            "top_hits": {
              "size": 10
            }
          },
          "bucket_filter": {
            "bucket_selector": {
              "buckets_path": {
                "count": "_count"
              },
              "script": "params.count >= 2"
            }
          }
        }
      }
    }
  },
  "dest": {
    "index": "new_index"
  }
}

Can anyone help? Thanks a lot!

dadoonet · April 12, 2024, 2:53pm

Reindex does not do that. It reads the _source field for each document and send it to the new index. It is not meant for what you want to do.

Have a look at Create transform API | Elasticsearch Guide [8.13] | Elastic

Jingyi_Wang · April 12, 2024, 4:40pm

Hi David, Thanks for pointing to Transform. I looked into how to use it.
But I am struggling with how to incorporate the aggregation script into the source.query section in Transform.
In a nutshell, the search result needs to be filtered based on the aggregation result.
This is the aggregation query:

"aggs": {
    "unique_username": {
      "terms": {
        "field": "username",
        "min_doc_count": 2
      },
      "aggs": {
        "top_events": {
          "top_hits": {
            "size": 10
          }
        }
      }
    }
  }

The first stage aggregation can return the overlapping username between two indexes.

"unique_username": {
      "terms": {
        "field": "username",
        "min_doc_count": 2
      }
}

The sub-aggregation is just to fetch those documents whose has one of those overlapping username.

More context in terms of what documents are in the two indexes:

{
        "_index" : "copper_scan_username_index",
        "_type" : "_doc",
        "_id" : "dTXSY0KNxU08jKDnZcc-nbkAAAAAAAAA",
        "_score" : 1.0,
        "_source" : {
          "count" : 3,
          "copper_scan_count" : 3,
          "username" : "username1"
        }
      },
      {
        "_index" : "copper_scan_username_index",
        "_type" : "_doc",
        "_id" : "dUz7p-Z5C0t74EbhcYuVU-4AAAAAAAAA",
        "_score" : 1.0,
        "_source" : {
          "count" : 2,
          "copper_scan_count" : 2,
          "username" : "username3"
        }
      },
      {
        "_index" : "fibre_scan_username_index",
        "_type" : "_doc",
        "_id" : "dTXSY0KNxU08jKDnZcc-nbkAAAAAAAAA",
        "_score" : 1.0,
        "_source" : {
          "fibre_scan_count" : 1,
          "count" : 1,
          "username" : "username1"
        }
      },
      {
        "_index" : "fibre_scan_username_index",
        "_type" : "_doc",
        "_id" : "dRYvt34fpDoAqHahXwa4t70AAAAAAAAA",
        "_score" : 1.0,
        "_source" : {
          "fibre_scan_count" : 2,
          "count" : 2,
          "username" : "username2"
        }
      }

dadoonet · April 12, 2024, 5:07pm

Yeah. I'm not sure you can solve this with the transform API. I never played with it myself.
If you can't, then you might have to do this "manually"..

Jingyi_Wang · April 12, 2024, 6:47pm

Let me further simplify the need.
Now I only need to save this aggregated result under a new index:

GET copper_scan_username_index,fibre_scan_username_index/_search
{
  "size": 0,
  "aggs": {
    "duplicate_username": {
      "terms": {
        "field": "username",
        "min_doc_count": 2
      }
    }
  }
}

The aggregations.bucket data in the return is what I needs to be indexed:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "duplicate_username" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "username1",
          "doc_count" : 2
        }
      ]
    }
  }
}

Truly appreciate if any advice can be given!

dadoonet · April 13, 2024, 6:57am

I'm not sure if that would work but look at Transform examples | Elasticsearch Guide [8.13] | Elastic. It looks similar to me.

Jingyi_Wang · April 15, 2024, 1:36pm

Yeah, it did worked but I have to modify the aggregation logic. It gave me what I need. Thanks!

Topic		Replies	Views
How to write script Elasticsearch	1	579	July 5, 2017
How can I save the aggregated results to another index? Elasticsearch	9	5476	December 11, 2018
Apply script on aggregation bucket Elasticsearch	1	420	February 8, 2018
Access aggregation key inside script Elasticsearch	1	216	October 18, 2022
Save results of aggregation to new index? Elasticsearch	2	1000	April 13, 2018

How can I save this aggregated results under a new index

Related topics