Using Transforms and including values which only exist on some events in the source index

Hi, I am trying to use Transforms to create an entity-centric index of my command logs. I need help including data that only appears on some documents. Each "command" is represented by multiple events, correlated by a command_unique_id. Each event has a different event_type and has different details about the command.

Here's a pseudo-representation of documents in my source index:

command_unique_id-1, timestamp, event_type-0 (start), username
command_unique_id-1, timestamp, event_type-1, networkstats, superfluousField
command_unique_id-1, timestamp, event_type-2 (end), authAuditInfo

I need the destination index to contain the following columns:

  • command_unique_id
  • time_duration_ms (timestamp difference between the start event (type 0) and the end event (type 2)
  • Cherry-picked fields that only exist per-event type.

Here's a pseudo-representation of what I want the destination to look like:

command_unique_id-1, time_duration_ms, username, networkstats, authAuditInfo

So far, I have the following pivot transform defined and it calculates the time_duration_ms. What I need help with is including the data that only exists for certain event_types. For example, how do I include username which only exists for event_type = 0? I also welcome any general feedback about my approach, as I am a novice. Thank you!

POST /_transform/_preview
{
  "source": {
    "index": "mydata-raw*"
  },
  "pivot": {
    "group_by": {
      "command_unique_id": {
        "terms": {
          "field": "command_unique_id"
        }
      }
    },
    "aggregations": {
      "start-command": {
        "filter": {
          "term": {
            "event_type": "0"
          }
        },
        "aggs": {
          "time": {
            "min": {
              "field": "@timestamp"
            }
          }
        }
      },
      "end-command": {
        "filter": {
          "term": {
            "event_type": "2"
          }
        },
        "aggs": {
          "time": {
            "max": {
              "field": "@timestamp"
            }
          }
        }
      },
      "time_duration_ms": {
        "bucket_script": {
          "buckets_path": {
            "start": "start-command>time.value",
            "end": "end-command>time.value"
          },
          "script": "params.end - params.start"
        }
      }
    }
  }
}

If you use a version prior 7.14 I suggest to have a look at scripted_metric and painless, see the painless based top hits example in the docs.

It becomes easier with 7.14 and above with top_metrics, see this example. Despite the name top metric, the output field must not be a metric, but can be a keyword field. Only the sort field must be of type metric.