I have to index a cluster of html pages, and I want to know if:
Is there a direct mean with es that allows to escape html tags/special
characters on the indexation or after search. Or do I have to escape them
in my side?
since my html pages are in different languages, is it possible to use a
different stemmer in the indexation according to the language of each page?
Is there a direct mean with es that allows to escape html
tags/special characters on the indexation or after search. Or do I
have to escape them in my side?
What do you mean by escape them? Do you mean strip them?
If so, then yes:
since my html pages are in different languages, is it possible to
use a different stemmer in the indexation according to the language
of each page?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.