Elastic Scoring for perfect match

AndyS · April 19, 2020, 10:44am

Hey,

I have a JSON query that gives different results when executed in Elastic vs executed in Kibana. In my use case of security log searching perfect matches are the desired answer, so I believe that Kibana is 'right' and Elastic is 'not 100% right'

From reading around I believe this is down to Scoring, which looks amazingly powerful, but seems too complicated so I wonder if I'm missing something?

The query I use is the creation of a layered approach, so many subqueries are bundled into one master query, here is one example:

{
  "query": {
	  "bool": {
		  "must": [{
				  "match": {
					  "answers": "lgincdn.trafficmanager.net,lgincdnvzeuno.azureedge.net,lgincdnvzeuno.ec.azureedge.net,cs1227.wpc.alphacdn.net,192.229.221.185"
				  }
			  }, {
				  "match": {
					  "source_ip": "192.168.1.10"
				  }
			  }
		  ],
		  "filter": []
	  }
  },
  "size": 10000,
  "_source": ["*"]
}

In elastic I get 18 hits, some of these are not perfect matches

In Kibana (query converted to RISON) for the same time frame I get 1 hit

Does it sound like I understand the problem correctly, and can someone give me a pointer into the right direction to force perfect matches in Elastic in the simplest way?

Many thanks
Andy

AndyS · April 19, 2020, 11:48am

The 1 hit in Kibana

{
    "ts": "2020-04-18T17:45:10.783496Z",
    "uid": "C1hMM64CCTc971GJGd",
    "id.orig_h": "192.168.1.10",
    "id.orig_p": 58952,
    "id.resp_h": "192.168.1.1",
    "id.resp_p": 53,
    "proto": "udp",
    "trans_id": 37980,
    "rtt": 0.030848026275634766,
    "query": "logincdn.msauth.net",
    "qclass": 1,
    "qclass_name": "C_INTERNET",
    "qtype": 1,
    "qtype_name": "A",
    "rcode": 0,
    "rcode_name": "NOERROR",
    "AA": false,
    "TC": false,
    "RD": true,
    "RA": true,
    "Z": 0,
    "answers": ["lgincdn.trafficmanager.net", "lgincdnvzeuno.azureedge.net", "lgincdnvzeuno.ec.azureedge.net", "cs1227.wpc.alphacdn.net", "192.229.221.185"],
    "TTLs": [155.0, 29.0, 1158.0, 3599.0, 1123.0],
    "rejected": false
}

18 hits in Elastic, here is one hit, it's similar, but not a perfect match

answers: ["cs199.wpc.alphacdn.net", "68.232.34.228"]
destination_ips: "192.168.1.1"
source_ip: "192.168.1.10"
protocol: "udp"
event_type: "bro_dns"
destination_ip: "192.168.1.1"
parent_domain_length: 5
syslog-facility: "user"
host: "gateway"
query_class: 1
aa: false
transaction_id: 22868
syslog-priority: "notice"
query: "files3.lynda.com"
rcode: 0
query_type: 1
subdomain_frequency_score: 7.5615
ips: ["192.168.1.10", "192.168.1.1"]
syslog-host: "seconion-NU691"
ra: true
tags: ["syslogng", "bro", "dns", "top-1m", "internal_destination", "internal_source"]
ttls: [40, 3370]
rd: true
port: 50718
subdomain: "files3"
syslog-tags: ".source.s_bro_dns"
frequency_scores: ["8.2685", "7.5615"]
syslog-host_from: "seconion-nu691"
parent_domain: "lynda"
syslog-sourceip: "127.0.0.1"
query_class_name: "C_INTERNET"
highest_registered_domain: "lynda.com"
top_level_domain: "com"
destination_port: 53
rejected: false
source_ips: "192.168.1.10"
uid: "CkeLOB18R37pFLbtr3"
highest_registered_domain_frequency_score: 8.2685
source_port: 60965
syslog-file_name: "/nsm/bro/logs/current/dns.log"
@version: "1"
timestamp: "2020-04-03T09:05:38.475Z"
logstash_time: 0.02882218360900879
message: "{"ts":"2020-04-03T09:05:37.392974Z","uid":"CkeLOB18R37pFLbtr3","id.orig_h":"192.168.1.10","id.orig_p":60965,"id.resp_h":"192.168.1.1","id.resp_p":53,"proto":"udp","trans_id":22868,"rtt":0.0424351692199707,"query":"files3.lynda.com","qclass":1,"qclass_name":"C_INTERNET","qtype":1,"qtype_name":"A","rcode":0,"rcode_name":"NOERROR","AA":false,"TC":false,"RD":true,"RA":true,"Z":0,"answers":["cs199.wpc.alphacdn.net","68.232.34.228"],"TTLs":[40.0,3370.0],"rejected":false}"
tld: {subdomain: "files3.lynda.com"}
subdomain_length: 6
tc: "false"
rcode_name: "NOERROR"
query_length: 16
rtt: 0.0424351692199707
@timestamp: "2020-04-03T09:05:37.392Z"
query_type_name: "A"
z: 0

dadoonet · April 19, 2020, 12:03pm

It depends on the mapping you are using.
If you are using a keyword data type then it will only match with exact terms.
Otherwise the text is analyzed before being indexed which produces this behavior.

If you are using the default mapping, you can append .keyword to the field name. It will do perfect match.

AndyS · April 19, 2020, 1:10pm

Thanks David, this works and after more testing I realised this is half the challenge

If I understand correctly ".keyword" forces an exact match on the entire field

Is it possible to force an exact match using a substring?

e.g. query match on "Bob,Charlie"
Should match "Alice,Bob,Charlie"
Should not match "Charlie.Bob"

Thanks

dadoonet · April 20, 2020, 10:09am

.keyword sub field is generated at index time behind the scene by Elasticsearch with the default mapping. It creates a .keyword sub field which has the type keyword. Which means that it is indexed exactly as it has been provided = no transformation.

If you search within this field, indeed only exact matches will work.

Is it possible to force an exact match using a substring?

Not sure. May be with match phrase query but on a text field. It should guarantee at least the positions of the tokens if this is what you are after.

But your example seems theorical, right? May be share a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.

AndyS · April 20, 2020, 10:51am

Short version
Many thanks David, let me go away and rephrase the question with examples

Long version
In this exact situation my process incorrectly turned an array of indexed strings into a single string which it was then comparing against the raw message, which is why I'm getting mixed results.
So at the moment my testing is flawed, I need to address how the project does this.
However in the long term I still need to change the query to be more precise.
I'll come back this in the near future when other bits are sorted

Thanks David, the ".keyword" helps for now
Andy

system · May 18, 2020, 10:51am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch scoring Issue for same data Elasticsearch kql-kibana-query-language , eql-elastic-query-language	3	399	April 12, 2022
Should I use Simple Query or Match Query? Elasticsearch	8	354	November 19, 2021
Incorrect relevance score of documents Elasticsearch	6	706	August 5, 2017
Query on KQL to get an exact match Kibana kql-kibana-query-language	4	5780	April 2, 2020
Exact match on log message field Kibana elastic-stack-alerting	6	2751	February 22, 2022

Elastic Scoring for perfect match

Related topics