I understand the MLT API to just be a combination of the Get API and MLT
query. The fields specified in 'mlt_fields' are used for both text
extraction to input into 'like_text' in the MLT query and also what fields
to compare against in the MLT query. My question is, is this correct and if
so, is all text placed in like_text (even if fields specified result in
hundreds of words)?
What you are describing is correct. The mlt_fields is used get the fields
and their associated values from the mlt document and create a mlt query.
For each field value combination (for example each array element in a
field) a more_like_this_field is created and the complete field value is
put in the like_text of that query. All the more_like_this_field
queries are combined in one bool query.
I understand the MLT API to just be a combination of the Get API and MLT
query. The fields specified in 'mlt_fields' are used for both text
extraction to input into 'like_text' in the MLT query and also what fields
to compare against in the MLT query. My question is, is this correct and if
so, is all text placed in like_text (even if fields specified result in
hundreds of words)?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.