Adding a synthetic field to documents generated by Transforms

Hello,

I'm playing with Elastic Transforms (Create transform API | Elasticsearch Guide [7.15] | Elastic)
I have questions about those:

I've created a transform like this:

PUT _transform/transform_test_2s
{
  "source": {
    "index": "server"
  },
  "pivot": {
    "group_by": {
      "grain": {
        "date_histogram": {
          "field": "created_at",
          "fixed_interval": "2s"
        }
      }
    },
    "aggregations": {
      "avg": {
        "avg": {
          "field": "wt"
        }
      }
    }
  },
  "dest": {
    "index": "transform_test"
  },
  "frequency": "30s"
}

When I run it:

POST _transform/transform_test_2s/_start

I got that kind of output in the target index:

      {
        "_source" : {
          "avg" : 1095432.0,
          "grain" : "2021-10-02T08:44:02.000Z"
        }
      },
      {
        "_source" : {
          "avg" : 709561.0,
          "grain" : "2021-10-02T08:44:22.000Z"
        }
      }

I would like to add a second transform transform_test_5s that does the same thing with "fixed_interval": "5s" instead of 2s.
The issue is that I would not be able to distinct records that have been generated by transform transform_test_2s or transform_test_5s

What I would like to have is :

      {
        "_source" : {
          "avg" : 1095432.0,
          "grain" : "2021-10-02T08:44:02.000Z",
          "used_period": "2s"
        }
      },
      {
        "_source" : {
          "avg" : 709561.0,
          "grain" : "2021-10-02T08:44:22.000Z",
          "used_period": "2s"
        }
      }

A way to achieve this is to use named aggregations like "grain_2s", eg:

    "group_by": {
      "grain_2s": {
        "date_histogram": {
          "field": "created_at",
          "fixed_interval": "2s"
        }
      }
    },

When running with both transforms, it results in :

      {
        "_source" : {
          "grain_5s" : "2021-10-05T07:25:05.000Z",
          "avg" : 274935.0
        }
      },
      {
        "_source" : {
          "avg" : 621279.0,
          "grain_2s" : "2021-10-02T08:44:22.000Z"
        }
      }

The drawback is that it creates n+1 fields in the index (n is the number of transforms)

So it leads me to the question: is it possible to have multiple transforms that result in something like the following ? I would need a synthetic field defined in the transform like the "grain" one below.

      {
        "_source" : {
          "date" : "2021-10-05T07:25:05.000Z",
          "grain": "5s",
          "avg" : 274935.0
        }
      },
      {
        "_source" : {
          "date" : "2021-10-02T08:44:22.000Z",
          "grain": "2s",
          "avg" : 621279.0
        }
      }

Thanks,
Romain

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.