Non-expected ".keyword" behavior


#1

I have a log entry
"http_url": "https://login.live.com/GetCredentialType.srf?wa=wsignin1.0....."

Why does this query find the entry:

GET _search
{
  "query": {
    "bool": {
      "must": [
         { "wildcard": { "http_url": "*getcredentialtype*"} },
         { "exists" : { "field" : "http_url" } }
      ]
    }
  },
  "sort": ["_doc"],
  "size": 1000
}

but this doesnt:

GET _search
{
  "query": {
    "bool": {
      "must": [
         { "wildcard": { "http_url.keyword": "*GetCredentialType*"} },
         { "exists" : { "field" : "http_url" } }
      ]
    }
  },
  "sort": ["_doc"],
  "size": 1000
}

I thought ".keyword" gives me the unanalyzed field and then I can search in this field. I have not found one way to add the .keyword such that this log will ever appear.


(Mayya Sharipova) #2

I am surprised you are getting these results. In elasticsearch 6.x with default settings for text and keyword fields you should get the opposite: you find results with "http_url.keyword", and you don't get results with "http_url"? What elasticsearch version are you using, and what is your index mapping for http_url field? Are you using any normalizers for the keyword field? You can check how your text got indexed using Term Vectors API.

About analysis - exactly as you said, keyword gives you not-analyzed field by default. And wildcard query as a term level query also works on not-analyzed text. So if you index GetCredentialType into the keyword field, it will keep it exactly as it is. So, to find it using term level queries, you also need to search for the exact word.