Bug in standard analyzer?

Hi all, we have a field that contains hostnames like "foo.bar.com". I noticed that when I searched for hostnames with only digits, I got too many results. Let's say I search for "1234.example.com", I would also get back all results for example.com. But this didnt happen if the subdomain is alpha, like foo.example.com. The search analyzer defaults to the standard analyzer, which tells me:

{
  "tokens" : [
    {
      "token" : "1234",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "<NUM>",
      "position" : 0
    },
    {
      "token" : "example.com",
      "start_offset" : 6,
      "end_offset" : 17,
      "type" : "<ALPHANUM>",
      "position" : 1
    }
  ]
}

This seems logical, but then I noticed the type for the second token is ALPHANUM, and surely 1234 is also ALPHANUM and so is the whole 1234.example.com. So why did it get split into 2 tokens?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.