We're using the light_spanish stemmer, but we've had some issues with some specific words, for example, if I search "papa", it doesn't show results that contain "papas". So, I tested the spanish stemmer, and it solves those problems, but since we have a large operation, I need to know what are the differences in practice between those two algorithms. Sure, these specific problems are solved with this algorithm, but I don't know if other words will be stemmed incorrectly and cause similar issues in the future.
Another question, even the spanish stemmer doesn't seem to work with 3 characters or less, for example, if I search "ajo", it doesn't show results that have the word "ajos", is there a solution for this? Other than adding custom mappings like I said above?
As you can see in the first lines of code in the latter stemmer, everything with a length of 5 is returned as is and not stemmed at all. That explains your above behaviour.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.