I have indexed large pdf files into elastic search engine. I have analyzed this Unstructured Data with a snowball analyzer. this converts words like "running" to "run" using the snowball filter. but what if I want to search exactly for word "running" ?
I cannot go for using another field as these are larger files and can increase the index size in a larger extent.
So how can I use the snowball filter to use something like "preserver_original". so that I can have original word preserved along with the filtered word.
is there any way or alternative way to analyze the field as per my requirement??
The normal way to do this is with the fields parameter in the mapping. This is convenient because you can search for both fields and use a different boost value so you prefer exact matches to stemmed matches.
@nik9000 If I have like 1000 words common in fields after analyzing, wouldn't it affect the relevance ? As I have duplicate words in in the inverted index.
That stemming-in-situ thing is what you are looking for, I think. It doesn't make an extra field like you want so the index'll be smaller. The way I proposed will make a larger index but it'll let you put more weight on exact matches which is nice, but maybe not what you need. If you have time it is worth playing with both options and seeing how the results look.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.