Querying records having colon : characters in Elasticsearch


(Satish) #1

Hi,

I am getting very strange results when I query a record which contains a Mac Address having multiple colon characters e.g. 80:c5:e6:36:6a:b4 using the query -> bool -> must -> match query via Curl. I tried escaping using \ but did not help. Anybody seen this behaviour? Any solution?


(Adrien Grand) #2

Can you provide a script with curl commands that reproduces the issue?


(Doug Turnbull) #3

What do you mean by strange results? Error? Unexpected search results?


(Satish) #4

Sure. Here's the query used:
curl -s -d '{ "query": { "filtered": { "query": { "bool": { "must" : [ {"match": {"mac_address": "80:c5:e6:36:6a:b4" } } ] } }, "filter": { "range" :{ "@timestamp" :{ "gt": "2015-09-11T01:00:00", "lt": "2015-09-11T23:00:00", "time_zone": "+5:30" } } } } }, "_source": [ "ipaddress", "mac_address", "@timestamp"], "sort": "@timestamp" }' http://eshost:9200/logstash-data-2015.09.11/_search?size=10000&pretty

I tried escaping : character but it didn't help.


(Satish) #5

Instead of match when I used match_phrase, the query gave the expected results matching the query containing the MacID only now! However with only "match" it gave way too many incorrect results.


(Doug Turnbull) #6

I'm betting that the mac address is being tokenized on the colons, creating several unique search terms. So when you search, you get any mac address that matches any of the two character hex values. The same way you'd get the document "cat dog" if you just searched for "cat mouse." Phrase queries work because it enforces that each term is adjacent, much like you might search with quotes on Google for "cat dog"

You probably want this field to be not_analyzed which will only allow exact matches. However you may also want to perform lowercasing as case doesn't matter in hex values.


(system) #7