Aggregate similar documents

Petr.Simik · February 5, 2021, 6:30am

I have a documents see example

"message":"Syslog connection established; fd='15', server='AF_INET(111.103.111.65:1514)', local='AF_INET(0.0.0.0:0)'"
"message":"Syslog connection established; fd='14', server='AF_INET(222.103.111.65:1514)', local='AF_INET(0.0.0.0:0)'"
"message":"Syslog connection established; fd='15', server='AF_INET(333.228.333.64:1514)', local='AF_INET(0.0.0.0:0)'"
"message":"Syslog connection established; fd='14', server='AF_INET(444.444.333.64:1514)', local='AF_INET(0.0.0.0:0)'"
"message":"[0x00001111] [WEBCONSOLE] >> Error:  CONSOLE ERROR ERROR TypeError: null is not an object"
"message":"[0x00002222] [WEBCONSOLE] >> Error:  CONSOLE ERROR ERROR TypeError: null is not an object"

and I want to aggregate similar documents

4 "message":"Syslog connection established...."
2 "message":"...CONSOLE ERROR ERROR TypeError...."

The intention is to do frequency analysis of syslog messages
I want to identify the most repeated messages in the dataset
the problem is I cant aggregate by term keyword message because if the line differs IP address it is not aggregated together.

I did it by filter aggregation but it has to be done manually but it is time consuming and I feel I miss many important messages.

"message":"*Syslog connection established*"

the dataset is huge 600k documents/sec

I am looking for some kind of function

group by messages with 70% similar words in it.

thank you

system · March 5, 2021, 6:30am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Group Documents by it's similarity Elasticsearch	1	356	August 30, 2019
Aggregation for similar strings Elasticsearch	1	477	July 6, 2017
Most frequently occurring phrases? Elasticsearch	4	1430	July 6, 2017
Text similarity with Elasticseacrh Elasticsearch	5	437	November 10, 2020
Aggregation by similarity Elasticsearch	1	342	January 24, 2019

Aggregate similar documents

Related topics