Hi all, we have a field that contains hostnames like "foo.bar.com". I noticed that when I searched for hostnames with only digits, I got too many results. Let's say I search for "1234.example.com", I would also get back all results for example.com. But this didnt happen if the subdomain is alpha, like foo.example.com. The search analyzer defaults to the standard analyzer, which tells me:
{
"tokens" : [
{
"token" : "1234",
"start_offset" : 0,
"end_offset" : 5,
"type" : "<NUM>",
"position" : 0
},
{
"token" : "example.com",
"start_offset" : 6,
"end_offset" : 17,
"type" : "<ALPHANUM>",
"position" : 1
}
]
}
This seems logical, but then I noticed the type for the second token is ALPHANUM, and surely 1234 is also ALPHANUM and so is the whole 1234.example.com. So why did it get split into 2 tokens?