Hello @thedraketaylor
The default analyzer for text
fields is the standard one.
To see which are the tokens generated by the standard
analyser, you can use:
POST _analyze
{
"analyzer": "standard",
"text": "http://domain.com/showthread.php?10357-thread-title-and-such/page22"
}
# Result
{
"tokens" : [
{
"token" : "http",
"start_offset" : 0,
"end_offset" : 4,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "domain.com",
"start_offset" : 7,
"end_offset" : 17,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "showthread.php",
"start_offset" : 18,
"end_offset" : 32,
"type" : "<ALPHANUM>",
"position" : 2
},
{
"token" : "10357",
"start_offset" : 33,
"end_offset" : 38,
"type" : "<NUM>",
"position" : 3
},
{
"token" : "thread",
"start_offset" : 39,
"end_offset" : 45,
"type" : "<ALPHANUM>",
"position" : 4
},
{
"token" : "title",
"start_offset" : 46,
"end_offset" : 51,
"type" : "<ALPHANUM>",
"position" : 5
},
{
"token" : "and",
"start_offset" : 52,
"end_offset" : 55,
"type" : "<ALPHANUM>",
"position" : 6
},
{
"token" : "such",
"start_offset" : 56,
"end_offset" : 60,
"type" : "<ALPHANUM>",
"position" : 7
},
{
"token" : "page22",
"start_offset" : 61,
"end_offset" : 67,
"type" : "<ALPHANUM>",
"position" : 8
}
]
}
You can test out simple
with:
POST _analyze
{
"analyzer": "simple",
"text": "http://domain.com/showthread.php?10357-thread-title-and-such/page22"
}
# Result
{
"tokens" : [
{
"token" : "http",
"start_offset" : 0,
"end_offset" : 4,
"type" : "word",
"position" : 0
},
{
"token" : "domain",
"start_offset" : 7,
"end_offset" : 13,
"type" : "word",
"position" : 1
},
{
"token" : "com",
"start_offset" : 14,
"end_offset" : 17,
"type" : "word",
"position" : 2
},
{
"token" : "showthread",
"start_offset" : 18,
"end_offset" : 28,
"type" : "word",
"position" : 3
},
{
"token" : "php",
"start_offset" : 29,
"end_offset" : 32,
"type" : "word",
"position" : 4
},
{
"token" : "thread",
"start_offset" : 39,
"end_offset" : 45,
"type" : "word",
"position" : 5
},
{
"token" : "title",
"start_offset" : 46,
"end_offset" : 51,
"type" : "word",
"position" : 6
},
{
"token" : "and",
"start_offset" : 52,
"end_offset" : 55,
"type" : "word",
"position" : 7
},
{
"token" : "such",
"start_offset" : 56,
"end_offset" : 60,
"type" : "word",
"position" : 8
},
{
"token" : "page",
"start_offset" : 61,
"end_offset" : 65,
"type" : "word",
"position" : 9
}
]
}
If you're using well known URLs (for which you know their typical structure), you might use the pattern
analyzer.
More information about this subject can be found in our documentation.
It is also possible to create a custom analyzer and use it in your index.