Elasticsearch - return result if we have 80% overlap

Hello.

I am trying to implement a data lookups using Logstash against some threat intelligence database from flowing proxy logs.

Basically I am loading documents to ES containing malicious URLs. Then, during the log processing, I am checking if the URL in logs is matching 1:1 with the ES Threat Intelligence db.

However, I would like to get also the results from the URLs that are matching 80%.
When loading the data to the threat intelligence index I am using:

"analysis": {

"filter": {
"my_ngram_filter": {
"type": "ngram",
"min_gram": 3,
"max_gram": 4
}
},
"analyzer": {
"my_ngram_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"my_ngram_filter"
]
}
}
}
"properties": {
"parameters": {
"type": "text",
"fields": {
"ngrammed": {
"type": "text",
"analyzer": "my_ngram_analyzer",
"search_analyzer": "my_ngram_analyzer"
}
}
}
}

and later I am querying:

{
"size": 2,
"query": {
"bool": {
"should": [
{
"match": {
"fqdn": "s3-us-west-2.amazonaws.com"
}
},
{
"match": {
"parameters.ngrammed": "somepath/malware.exe"
}
}
],
"minimum_should_match" : 2,
"boost": 2.0
}
}
}

But I do not know how to accomplish the 80% overlap.
I am open for any suggestion.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.