[Resolved]Logstash - data is not aggregrated in chronological order of a httpsession and output file looks like in Json format

cheriemilk · April 29, 2019, 4:40am

Hi Team,

Logstash configuraiton run as below
input { file{ path => "C:/elkstack/elasticsearch-6.5.1/logs/userscenario.csv"
start_position => "beginning"
sincedb_path => "C:/elkstack/elasticsearch-6.5.1/sincedb/sincedb.txt" }}

  filter { 
          csv { columns => [ "when",
                             "httpsessionId",
                             "module",
                             "page",
                             "userId",
                             "actionType"]
               separator => ","
               skip_header => "true"}
   
       aggregate {  task_id => "%{httpsessionId}"
                    code => ' map["userscenario"] ||= ""
                              map["userscenario"] += event.get("actionType") + "->"
                              map["userId"] = event.get("userId")
                              event.cancel '
                    push_map_as_event_on_timeout => true
                    timeout_task_id_field => "httpsessionId"
                    timeout => 3600
                    timeout_code => ' event.set("userscenario", event.get("userscenario").chomp("->")) ' }}


output { file { path => "C:/elkstack/elasticsearch-6.5.1/logs/agguserscenario.csv"  } 
	          stdout { codec => rubydebug }
	         }

Content of the source file "userscenario.csv"

when,httpsessionId,module,page,userId,actionType
2019-02-13 10:01:30,sid1,succession,talentsearch,cgrant1,scm.ts.list_saved_search
2019-02-13 10:01:31,sid3,calibration,ManageCalibrationTemplates,hr1,cal.mct.create
2019-02-13 10:01:31,sid1,succession,talentsearch,cgrant1,scm.ts.start_over
2019-02-13 10:01:33,sid1,succession,talentsearch,cgrant1,scm.ts.delete_saved_search
2019-02-13 10:01:33,sid3,calibration,ManageCalibrationTemplates,hr1,cal.mct.edit
2019-02-13 10:01:30,sid2,succession,talentsearch,lokamoto1,scm.ts.list_saved_search
2019-02-13 10:01:33,sid2,succession,talentsearch,lokamoto1,scm.ts.search
2019-02-13 10:01:35,sid2,succession,talentsearch,lokamoto1,scm.ts.nominate
2019-02-13 10:01:35,sid3,calibration,ManageCalibrationTemplates,hr1,cal.mct.delete

**After logstash finished the processing, I open the CSV file created by output file filter, and the file content is not created as expected. **
1. All content is in one column, looks like it's 3 json object
2. the final the string of userscenario field is not concatenating by time ascendly(The when field)

    {"httpsessionId":"sid1","userId":"cgrant1","@version":"1","userscenario":"scm.ts.start_over->scm.ts.delete_saved_search->scm.ts.list_saved_search","@timestamp":"2019-04-29T07:08:23.971Z"}
    {"httpsessionId":"sid3","userId":"hr1","@version":"1","userscenario":"cal.mct.create->cal.mct.edit->cal.mct.delete","@timestamp":"2019-04-29T07:08:23.990Z"}
    {"httpsessionId":"sid2","userId":"lokamoto1","@version":"1","userscenario":"scm.ts.nominate->scm.ts.list_saved_search->scm.ts.search","@timestamp":"2019-04-29T07:08:23.991Z"}

Expected output file

httpsessionId,userId,userscenario
sid1,cgrant1,scm.ts.list_saved_search->scm.ts.start_over->scm.ts.delete_saved_search
sid2,hr1,cal.mct.create->cal.mct.edit->cal.mct.delete
sid3,lokamoto1,scm.ts.list_saved_search->scm.ts.search->scm.ts.nominate

How to correct?

Badger · April 29, 2019, 2:16pm

You are using a file output, which defaults to a json_lines codec. Try using a csv output.

cheriemilk · April 29, 2019, 10:32pm

Ok. Thx.Trying...
Another question: how to make the data aggregated in chronological order??

The userscenario for cgrant1 is aggregated to "userscenario":"scm.ts.start_over->scm.ts.delete_saved_search->scm.ts.list_saved_search"

But the expected aggregated sequence should be below according to the “when” field which means when the actionType happened on WebUI.
"userscenario":"scm.ts.list_saved_search->scum.ts.start_over->scm.ts.delete_saved_search”

Badger · April 29, 2019, 11:49pm

Do you have pipeline.java_execution enabled? Do you have "--pipeline.workers 1" set?

cheriemilk · April 30, 2019, 2:24am

[2019-04-29T15:46:34,297][DEBUG][logstash.runner ] pipeline.java_execution: false
[2019-04-29T15:46:34,296][DEBUG][logstash.runner ] pipeline.workers: 4

Do you have pipeline.java_execution enabled?

No
Do you have "--pipeline.workers 1" set?
No

What does this two mean??

Badger · April 30, 2019, 1:27pm

The execution engine that logstash runs on is being rewritten into Java. At one point I was seeing ordering problems with that.

Your issue is much more likely to be that you have multiple worker threads. That produces race conditions because the threads may process events out of order. Set "--pipeline.workers 1". That means logstash will only use 1 CPU, but it is a requirement of the aggregate filter.

cheriemilk · May 2, 2019, 6:02am

Thank you~

system · May 30, 2019, 6:02am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash events out of order Logstash	3	2084	April 29, 2020
Multiple Logfiles from diffrent ClusterNodes to one merged Logfile Logstash	4	363	July 6, 2017
Order of treatment Logstash	2	301	August 23, 2018
Ignore this ,this topic is not valid any more Logstash	2	296	May 27, 2019
Logstash processing messages out of order Logstash	3	733	October 9, 2019

[Resolved]Logstash - data is not aggregrated in chronological order of a httpsession and output file looks like in Json format

Related topics