Hi, I am trying to use Transforms to create an entity-centric index of my command logs. I need help including data that only appears on some documents. Each "command" is represented by multiple events, correlated by a command_unique_id
. Each event has a different event_type
and has different details about the command.
Here's a pseudo-representation of documents in my source index:
command_unique_id-1, timestamp, event_type-0 (start), username
command_unique_id-1, timestamp, event_type-1, networkstats, superfluousField
command_unique_id-1, timestamp, event_type-2 (end), authAuditInfo
I need the destination index to contain the following columns:
command_unique_id
-
time_duration_ms
(timestamp difference between the start event (type 0) and the end event (type 2) - Cherry-picked fields that only exist per-event type.
Here's a pseudo-representation of what I want the destination to look like:
command_unique_id-1, time_duration_ms, username, networkstats, authAuditInfo
So far, I have the following pivot transform defined and it calculates the time_duration_ms
. What I need help with is including the data that only exists for certain event_types
. For example, how do I include username
which only exists for event_type = 0
? I also welcome any general feedback about my approach, as I am a novice. Thank you!
POST /_transform/_preview
{
"source": {
"index": "mydata-raw*"
},
"pivot": {
"group_by": {
"command_unique_id": {
"terms": {
"field": "command_unique_id"
}
}
},
"aggregations": {
"start-command": {
"filter": {
"term": {
"event_type": "0"
}
},
"aggs": {
"time": {
"min": {
"field": "@timestamp"
}
}
}
},
"end-command": {
"filter": {
"term": {
"event_type": "2"
}
},
"aggs": {
"time": {
"max": {
"field": "@timestamp"
}
}
}
},
"time_duration_ms": {
"bucket_script": {
"buckets_path": {
"start": "start-command>time.value",
"end": "end-command>time.value"
},
"script": "params.end - params.start"
}
}
}
}
}