[Resolved]Logstash - data is not aggregrated in chronological order of a httpsession and output file looks like in Json format

Hi Team,

Logstash configuraiton run as below
input { file{ path => "C:/elkstack/elasticsearch-6.5.1/logs/userscenario.csv"
start_position => "beginning"
sincedb_path => "C:/elkstack/elasticsearch-6.5.1/sincedb/sincedb.txt" }}

  filter { 
          csv { columns => [ "when",
                             "httpsessionId",
                             "module",
                             "page",
                             "userId",
                             "actionType"]
               separator => ","
               skip_header => "true"}
   
       aggregate {  task_id => "%{httpsessionId}"
                    code => ' map["userscenario"] ||= ""
                              map["userscenario"] += event.get("actionType") + "->"
                              map["userId"] = event.get("userId")
                              event.cancel '
                    push_map_as_event_on_timeout => true
                    timeout_task_id_field => "httpsessionId"
                    timeout => 3600
                    timeout_code => ' event.set("userscenario", event.get("userscenario").chomp("->")) ' }}


output { file { path => "C:/elkstack/elasticsearch-6.5.1/logs/agguserscenario.csv"  } 
	          stdout { codec => rubydebug }
	         }

Content of the source file "userscenario.csv"

when,httpsessionId,module,page,userId,actionType
2019-02-13 10:01:30,sid1,succession,talentsearch,cgrant1,scm.ts.list_saved_search
2019-02-13 10:01:31,sid3,calibration,ManageCalibrationTemplates,hr1,cal.mct.create
2019-02-13 10:01:31,sid1,succession,talentsearch,cgrant1,scm.ts.start_over
2019-02-13 10:01:33,sid1,succession,talentsearch,cgrant1,scm.ts.delete_saved_search
2019-02-13 10:01:33,sid3,calibration,ManageCalibrationTemplates,hr1,cal.mct.edit
2019-02-13 10:01:30,sid2,succession,talentsearch,lokamoto1,scm.ts.list_saved_search
2019-02-13 10:01:33,sid2,succession,talentsearch,lokamoto1,scm.ts.search
2019-02-13 10:01:35,sid2,succession,talentsearch,lokamoto1,scm.ts.nominate
2019-02-13 10:01:35,sid3,calibration,ManageCalibrationTemplates,hr1,cal.mct.delete

**After logstash finished the processing, I open the CSV file created by output file filter, and the file content is not created as expected. **
1. All content is in one column, looks like it's 3 json object
2. the final the string of userscenario field is not concatenating by time ascendly(The when field)

    {"httpsessionId":"sid1","userId":"cgrant1","@version":"1","userscenario":"scm.ts.start_over->scm.ts.delete_saved_search->scm.ts.list_saved_search","@timestamp":"2019-04-29T07:08:23.971Z"}
    {"httpsessionId":"sid3","userId":"hr1","@version":"1","userscenario":"cal.mct.create->cal.mct.edit->cal.mct.delete","@timestamp":"2019-04-29T07:08:23.990Z"}
    {"httpsessionId":"sid2","userId":"lokamoto1","@version":"1","userscenario":"scm.ts.nominate->scm.ts.list_saved_search->scm.ts.search","@timestamp":"2019-04-29T07:08:23.991Z"}

Expected output file

httpsessionId,userId,userscenario
sid1,cgrant1,scm.ts.list_saved_search->scm.ts.start_over->scm.ts.delete_saved_search
sid2,hr1,cal.mct.create->cal.mct.edit->cal.mct.delete
sid3,lokamoto1,scm.ts.list_saved_search->scm.ts.search->scm.ts.nominate

How to correct?

You are using a file output, which defaults to a json_lines codec. Try using a csv output.

Ok. Thx.Trying...
Another question: how to make the data aggregated in chronological order??

The userscenario for cgrant1 is aggregated to "userscenario":"scm.ts.start_over->scm.ts.delete_saved_search->scm.ts.list_saved_search"

But the expected aggregated sequence should be below according to the “when” field which means when the actionType happened on WebUI.
"userscenario":"scm.ts.list_saved_search->scum.ts.start_over->scm.ts.delete_saved_search”

Do you have pipeline.java_execution enabled? Do you have "--pipeline.workers 1" set?

[2019-04-29T15:46:34,297][DEBUG][logstash.runner ] pipeline.java_execution: false
[2019-04-29T15:46:34,296][DEBUG][logstash.runner ] pipeline.workers: 4

Do you have pipeline.java_execution enabled?

  • No
    Do you have "--pipeline.workers 1" set?
  • No

What does this two mean??

The execution engine that logstash runs on is being rewritten into Java. At one point I was seeing ordering problems with that.

Your issue is much more likely to be that you have multiple worker threads. That produces race conditions because the threads may process events out of order. Set "--pipeline.workers 1". That means logstash will only use 1 CPU, but it is a requirement of the aggregate filter.

1 Like

Thank you~

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.