Hello @thedraketaylor
The default analyzer for text fields is the standard one.
To see which are the tokens generated by the standard analyser, you can use:
POST _analyze
{
"analyzer": "standard",
"text": "http://domain.com/showthread.php?10357-thread-title-and-such/page22"
}
# Result
{
"tokens" : [
{
"token" : "http",
"start_offset" : 0,
"end_offset" : 4,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "domain.com",
"start_offset" : 7,
"end_offset" : 17,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "showthread.php",
"start_offset" : 18,
"end_offset" : 32,
"type" : "<ALPHANUM>",
"position" : 2
},
{
"token" : "10357",
"start_offset" : 33,
"end_offset" : 38,
"type" : "<NUM>",
"position" : 3
},
{
"token" : "thread",
"start_offset" : 39,
"end_offset" : 45,
"type" : "<ALPHANUM>",
"position" : 4
},
{
"token" : "title",
"start_offset" : 46,
"end_offset" : 51,
"type" : "<ALPHANUM>",
"position" : 5
},
{
"token" : "and",
"start_offset" : 52,
"end_offset" : 55,
"type" : "<ALPHANUM>",
"position" : 6
},
{
"token" : "such",
"start_offset" : 56,
"end_offset" : 60,
"type" : "<ALPHANUM>",
"position" : 7
},
{
"token" : "page22",
"start_offset" : 61,
"end_offset" : 67,
"type" : "<ALPHANUM>",
"position" : 8
}
]
}
You can test out simple with:
POST _analyze
{
"analyzer": "simple",
"text": "http://domain.com/showthread.php?10357-thread-title-and-such/page22"
}
# Result
{
"tokens" : [
{
"token" : "http",
"start_offset" : 0,
"end_offset" : 4,
"type" : "word",
"position" : 0
},
{
"token" : "domain",
"start_offset" : 7,
"end_offset" : 13,
"type" : "word",
"position" : 1
},
{
"token" : "com",
"start_offset" : 14,
"end_offset" : 17,
"type" : "word",
"position" : 2
},
{
"token" : "showthread",
"start_offset" : 18,
"end_offset" : 28,
"type" : "word",
"position" : 3
},
{
"token" : "php",
"start_offset" : 29,
"end_offset" : 32,
"type" : "word",
"position" : 4
},
{
"token" : "thread",
"start_offset" : 39,
"end_offset" : 45,
"type" : "word",
"position" : 5
},
{
"token" : "title",
"start_offset" : 46,
"end_offset" : 51,
"type" : "word",
"position" : 6
},
{
"token" : "and",
"start_offset" : 52,
"end_offset" : 55,
"type" : "word",
"position" : 7
},
{
"token" : "such",
"start_offset" : 56,
"end_offset" : 60,
"type" : "word",
"position" : 8
},
{
"token" : "page",
"start_offset" : 61,
"end_offset" : 65,
"type" : "word",
"position" : 9
}
]
}
If you're using well known URLs (for which you know their typical structure), you might use the pattern analyzer.
More information about this subject can be found in our documentation.
It is also possible to create a custom analyzer and use it in your index.