I'm indexing website content with a crawler into ES.
So now I'm trying to apply an analyzer to my index (website content) to remove all stopwords from the website content throught the Java API. Therefore I load the settings from a .json settings file when indexing with a FileInputStream.
When looking into the metadata in the head plugin the analyzer is shown properly.
I can even access the analyzer via the sense plugin.
But what I actually wanted to achieve was to remove stopwords from the index data that I'm indexing.
I expected the content to be cleaned from stopwords. But all stopwords remain there.
When you applied your new mappings did you create a new index with the new mappings and then re-index your data into the new index? You cannot update mappings on an existing index so if you didn't re-index you will need to for your analyzer to be used in your mappings
I expected the content to be cleaned from stopwords. But all stopwords remain there.
If you're expecting the stopwords to be removed from the _source field, that doesn't happen. The _source field is never changed. It just removes stopwords from the field before indexing the terms for search.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.