Does es-hadoop maven dependency have support for built-in analyzers? I am setting up a project to read & write data from Elasticsearch using spark and have created simple analyzers on index but need more help on this further.
Hi @Shalini_Pereira. I'm not completely sure I understand your question. I'm assuming you are using a non-default analyzer, and that you have specified it in your index mapping (or template). In that case, whenever you index a document (whether through es-hadoop or the index API), your analyzer is applied to the document on the Elasticsearch side. There is nothing you need to do on the es-hadoop side. Does that answer your question? Or are you wanting to specify an analyzer when you are reading data out with es-hadoop?
I am using a built-in analyzer such as keyword or whitespace analyzer which is added to the index settings. Just wanted clarification on whether any option is to be passed when using the es-hadoop dependency or the created analyzer will be used by default by es-hadoop during read.
If it is in the index settings, it will be used by Elasticsearch. You do not need to pass any arguments to es-hadoop. You can confirm by posting a document through the REST API (Index API | Elasticsearch Guide [8.1] | Elastic). If your analyzer is used there, it will be used by es-hadoop.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.