I am getting very strange results when I query a record which contains a Mac Address having multiple colon characters e.g. 80:c5:e6:36:6a:b4 using the query -> bool -> must -> match query via Curl. I tried escaping using \ but did not help. Anybody seen this behaviour? Any solution?
Instead of match when I used match_phrase, the query gave the expected results matching the query containing the MacID only now! However with only "match" it gave way too many incorrect results.
I'm betting that the mac address is being tokenized on the colons, creating several unique search terms. So when you search, you get any mac address that matches any of the two character hex values. The same way you'd get the document "cat dog" if you just searched for "cat mouse." Phrase queries work because it enforces that each term is adjacent, much like you might search with quotes on Google for "cat dog"
You probably want this field to be not_analyzed which will only allow exact matches. However you may also want to perform lowercasing as case doesn't matter in hex values.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.