[Resolved]Logstash - what filter can transform the customer behaviors in a httpsession(1 behavior is 1 log event) into different fields of 1 event?

Hi Team,

I have a new data transforming requirement for making a Sankey Diagram to visualize the user's behavior(action) flow.

source CSV data:

2019-02-13 10:01:29,sid0,succession,9box,cgrant1,scm.mr.generate_howvswhat_report
2019-02-13 10:01:30,sid1,succession,talentsearch,cgrant1,scm.ts.list_saved_search
2019-02-13 10:01:30,sid0,succession,9box,cgrant1,scm.mr.export_howvswhat_report
2019-02-13 10:01:31,sid3,calibration,ManageCalibrationTemplates,hr1,cal.mct.create
2019-02-13 10:01:31,sid1,succession,talentsearch,cgrant1,scm.ts.start_over
2019-02-13 10:01:33,sid0,succession,9box,cgrant1,scm.mr.reset_filter
2019-02-13 10:01:33,sid1,succession,talentsearch,cgrant1,scm.ts.delete_saved_search
2019-02-13 10:01:33,sid3,calibration,ManageCalibrationTemplates,hr1,cal.mct.edit
2019-02-13 10:01:30,sid2,succession,talentsearch,lokamoto1,scm.ts.list_saved_search
2019-02-13 10:01:33,sid2,succession,talentsearch,lokamoto1,scm.ts.search
2019-02-13 10:01:37,sid0,succession,9box,cgrant1,scm.mr.export_howvswhat_report
2019-02-13 10:01:35,sid2,succession,talentsearch,lokamoto1,scm.ts.nominate

I want transform above data into below format, with appending the actionTypes in next events to a new column per httpsessionid and userid.


maybe aggregate filter as well? but i couldn't figure out

1)how aggregate filter create new field actionType1,actionType2,actionType3 and so on and fill in them with correct data correspondingly.
2) how aggregate filter fill in NULL for actionType missing case
3) can't know in advance how many fields to create for filling in action Type in new event

Do you want at most 4 action types? If not, how many NULLs do you want to append to events that have fewer than some others?

The number of action types depends on the real production log. If the max number of action types in a sessionId is 5, need create 5 actionType fields in the output files, then for the sessionId that has less than 5 actionType, will use Null to fill in the missing actionType. As for the how many Null to fill for each new event. it depends on the production log as well.
This case looks like bit complex, and I am confused.


    csv { source => "message" skip_header => true autodetect_column_names => true }
    aggregate {
        task_id => "%{httpsessionId} %{userId}"
        code => '
            map["actions"] ||= [ "NULL", "NULL", "NULL" , "NULL" , "NULL" ]
            map["numActions"] ||= 0
            map["httpsessionId"] ||= event.get("httpsessionId")
            map["module"] ||= event.get("module")
            map["userId"] ||= event.get("userId")

            map["actions"][map["numActions"]] = event.get("actionType")
            map["numActions"] += 1
        push_map_as_event_on_timeout => true
        timeout => 10


will produce events like

       "userId" => "lokamoto1",
   "numActions" => 3,
      "actions" => [
    [0] "scm.ts.list_saved_search",
    [1] "scm.ts.search",
    [2] "scm.ts.nominate",
    [3] "NULL",
    [4] "NULL"
     "@version" => "1",
       "module" => "succession",
   "@timestamp" => 2019-05-09T22:47:22.933Z,
"httpsessionId" => "sid2"
1 Like

It does work. Thank you for the support on my project.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.