Do any of these analysis happen in appsearch by default,
so for Example:
If we index a books content to appsearch, we would not want duplicated words or stopwords to be indexed.
I don't know off of the top of my head the exact setup we have for analyzers, but they almost certainly handle stop words and casing. Picking the correct "Language" option for your Engine will help.
I don't think duplicated terms is something you need to worry about necessarily. The frequency at which a term is found in a particular field is a signal that can determine relevance, meaning if a term appears 200 times in the body of 1 document and only 1 time in another, it may be considered more relevant. Additionally, I'm pretty sure that since Elasticsearch uses an Inverted Index (you can google that one), that duplicated terms don't have any sort of detrimental impact to your index.
I'm not an expert though, you might have better luck inquiring about analyzers in the Elasticsearch discuss group.
The reason I asked the OP what they are trying to achieve is because I don't think it's something that they need to worry about. When you index a document, a document is indexed and the raw content of that document is stored as well, but separately. Just because they see the raw document in their search response doesn't mean stop words are being searched.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.