I'm facing the same problem mentioned in this thread - I need to use an edit distance greater than 2 for fuzzy searches, but Elasticsearch doesn't seem to support this.
I understand the performance concerns with higher Levenshtein distance values, but for my specific use case with small datasets, this isn't a concern. What surprised me is that this appears to be completely unconfigurable - I can't find any setting to adjust this limit.
Is this limitation truly hard-coded? If so, where I can find this value in the source code to change and build a custom version?
I'v tried to increase the fuzzy edit distance limit beyond 2 by modifying the Lucene source code. Here's what I've done so far:
Changed MAXIMUM_SUPPORTED_DISTANCE from 2 to 5 in the Lucene core
Recompiled the lucene-core library
Replaced the JAR file in Elasticsearch's lib directory
Verified via decompilation that the change is present in the JAR
However, I'm still getting the same error when trying to use edit distances greater than 2. This suggests there might be additional validation happening elsewhere in the codebase that's enforcing this limit.
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Valid edit distances are [0, 1, 2] but was [3]"
}
],
"type": "illegal_argument_exception",
"reason": "Valid edit distances are [0, 1, 2] but was [3]"
},
"status": 400
}
Are there other places in Elasticsearch or Lucene where this validation might be occurring? I suspect there might be another layer of checks beyond the MAXIMUM_SUPPORTED_DISTANCE constant.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.