Transform aggregation creates flattened record

dimuskin · July 14, 2022, 2:05pm

I use transformation for data preprocessing, in my case every second rps are created for each host

{
  "id": "appgw-rps-host",
  "source": {
    "index": [
      "logs-azure.platformlogs-default"
    ]
  },
  "dest": {
    "index": "appgw-rps-host"
  },
  "frequency": "10m",
  "sync": {
    "time": {
      "field": "@timestamp",
      "delay": "5m"
    }
  },
  "pivot": {
    "group_by": {
      "@timestamp": {
        "date_histogram": {
          "field": "@timestamp",
          "calendar_interval": "1s"
        }
      }
    },
    "aggregations": {
      "host": {
        "terms": {
          "field": "event.host"
        }
      }
    }
  }
}

everything works perfectly and I get the data I need. event example:

{
  "_index": "appgw-rps-host",
  "_id": "ADUZ-GZffUzMkUJXMRaP3wgAAAAAAAAA",
  "_version": 1,
  "_score": 1,
  "_source": {
    "@timestamp": "2022-07-12T09:57:41.000Z",
    "host": {
      "host1.com": 2,
      "host2.com": 4,
      "host3.com": 42
    }
  }
}

but this "host" type is flatten

{
  "mappings": {
    "_meta": {
      "created_by": "transform",
    },
    "properties": {
      "@timestamp": {
        "type": "date"
      },
      "host": {
        "type": "flattened"
      }
    }
  }
}

and I can't work with them. For example, I can’t make a graph in kibana or perform any aggregation like with a numeric value. Maybe there is some possibility not to do the flattened type or do some convert using ingest pipeline?

Best Regards,
Dmitri

dimuskin · July 15, 2022, 10:51am

looks working with index template

 "dynamic_templates": [
        {
          "analysed_string_template": {
            "path_match": "host.*",
            "mapping": {
              "type": "long"
            }
          }
        }
      ]

Hendrik_Muhs · July 18, 2022, 6:42am

Great you found a solution yourself.

When transform creates mappings for the destination index it does it best effort. Because terms is unbounded, we choose flattened as default mapping. This data type does not run into so called "mapping explosion", meaning it it does not create too many mappings.

The default mappings aren't taken if you create the destination index yourself, directly or using an index template. The destination index is created when starting the transform for the first time. This step is skipped if an existing index is found. The mappings that transform would use are part of the _preview API output.

dimuskin · July 18, 2022, 8:48pm

I also noticed that first you need to make an index with at least one record (which can be deleted later) and only after that turn on the transform. otherwise, the transform ignores the mapping in the template.

Hendrik_Muhs · July 19, 2022, 6:09am

Yes, the index must exist, the template isn't sufficient. You should be able to create an empty index.

system · August 16, 2022, 6:10am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.