Hi There:
The solution was given in the Practice Exam, just a bit confused with the custom analyzer and script query. Please advise, appreciate your helps. Thank you so much, see below Task 5 question and solutions
Task 5 Question
Create a new index on cluster1 named task5 that satisfies the following requirements:
contains all of the documents from the blogs index
whenever the string UK (both capital letters) appears in the content field, it gets replaced with United Kingdom
Your solution would work, but it would be a one time operation.
When doing the _reindex only with a script, you would have to rerun your script, or do an update by query to make new data coming in to be analyzed also for UK => United Kingdom.
With setting it in the mapping new data ingested gets analyzed automatically with the new index in play. So it 'fixes' the current data of the old index while keeping it correct with new incoming data.
I’d like to add another aspect to Peter’s response: Often there is not just one way of getting a task solved. Having said so, if your solution is different, but solves the task set, it should be fine.
Just carefully read the task in the exam, whether you are asked to apply a “one-time” fix or a “general” fix.
This is awesome, I'm now fully understand the advantage of having the custom analyzer and script query. So, the custom analyzer is definitely a better solution because it fixes the current data and future data.
You could easily add your script to an ingest pipeline and also fix new/future data.
That said, I don't think that this is the best solution as it changes the data itself and many times that is not what users want. The main advantage of the analysis option is that it only changes the internal data structures used to search.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.