Hi All,
I have queries where I want to allow searching on part numbers/help ticket numbers and the like with reserved characters in them. For example:
SR-1234
SR/1234
The fields are analyzed with the standard analyzer and with an edge ngrams N=2 to 20.
When I query like this:
{
"query": {
"bool": {
"should": [
{
"simple_query_string": {
"query": "sr",
"fields": [
"PartNo", "PartNo.autocomplete"
]
}
}
]
}
},
"highlight": {
"fields": {
"PartNo": {}
}
}
}
I get a result, the highlighter shows matches such as:
sr -> "SR/1234"
sr-1234 -> "SR/1234"
and this makes sense. The analyzer at index and search time has removed the "-", "/" etc and indexed sr and 1234 independently, and they are found independently.
Now, if I search with fuzzy by adding the ~:
sr~ -> "SR/1234". ok,
but
sr-1234~ -> no results
sr\-1234~ -> no results
Can someone explain what fuzzy is doing here and if it is possible to formulate a query to achieve some level of fuzziness over the number (which is likely to be a ticket number/bug number and quite large, 8 digits or more)?
I know the edge-ngrams will help with trailing uncertainty in the number (missing trailing characters), but if a user transposes 2 characters it will not help.
all thoughts appreciated,
Dave.