How to compare the log values and drop the deduplicated log in Logstash

Feb 15, 2022 @ 16:16:01.743	columns.url:https://mail.google.com/mail/u/0/#inbox name:chrome_browser_history @timestamp:Feb 15, 2022 @ 16:16:01.743 @version:1 action:added calendarTime:Tue Feb 15 08:15:58 2022 UTC columns.path:/Users/abc/Library/Application Support/Google/Chrome/Default/History columns.title:Inbox (5) - xxx counter:754 decorations.username:abc epoch:0 host:1.2.3.4 hostIdentifier:abc numerics:false port:23,679 tags:_grokparsefailure unixTime:1,644,912,958 _id:Q3dz_H4B1DWSfDTH6Wvn _index:logstash _score: - _type:_doc

Feb 15, 2022 @ 16:16:01.743	columns.url:https://mail.google.com/mail/u/0/#inbox name:chrome_browser_history @timestamp:Feb 15, 2022 @ 16:16:01.743 @version:1 action:added calendarTime:Tue Feb 15 08:15:58 2022 UTC columns.path:/Users/abc/Library/Application Support/Google/Chrome/Default/History columns.title:Inbox (5) - xxx counter:754 decorations.username:abc epoch:0 host:1.2.3.4 hostIdentifier:abc numerics:false port:23,679 tags:_grokparsefailure unixTime:1,644,912,958 _id:r3dz_H4B1DWSfDTH6mst _index:logstash _score: - _type:_doc

There was some browsing history log from Chrome, as there are hundreds of the same raw logs in one URL.

Basically, the log content is the same, the only difference is the _id values which were believed that be generated by Logstash.

I know the history may generate properly, it just created a lot of the same browsing history log in a second.

But how can I create a filter or any other way that can deduplicate the raw log within a timeslot like 1 second?

Thanks a lot.

Use a throttle filter on the calendarTime field.

_id is added by elasticsearch, not logstash.

1 Like

Thanks for the information, problem solved.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.