Hi
We use the "standard" tokenizer in custom analyzer definitions. By default
the standard tokenizer splits words on hyphens and ampersands, so for
example "i-mac" is tokenized to "i" and "mac"
Is there any way to configure the behaviour of the standard tokenizer to
stop it splitting words on all punctuations except comma(","), while still doing all
the other tokenizing it does.
Or maybe define a custom tokenizer which will achieve the above
For example:
query string ["n-12"] should create a this token:
{
"tokens" : [
{
"token" : "n-12",
"start_offset" : 0,
"end_offset" : 2,
"type" : "<ALPHANUM>",
"position" : 0
}
]
}
instead of
{
"tokens" : [
{
"token" : "n",
"start_offset" : 0,
"end_offset" : 1,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "12",
"start_offset" : 0,
"end_offset" : 2,
"type" : "<ALPHANUM>",
"position" : 0
}
]
}
Regards,
Nipun