I have a field called document.content that contains the overall text of the document that I am indexing and I want to return only those records where document.content is greater than a certain length. How would I go about doing that? I did come across some examples using scripts but really didn't understand the format of the query. Here is one example I saw and I was expecting to be able to do something more like document.content.length() > 50:
In this case above, I am not sure what that doc array reference is about or what the values is referring to. Also, I gather that there may be some issues using scripts in general (aside from the security concerns) for something like this so any suggestions on how to do this in the most performant way would be appreciated. Thanks.
it will be easy and faster if you stock the lengh of your text in the same document but new field "length_doc" ( int value) , and in your query you filter >> range "length_doc" > 50 !
Thanks for the suggestion. We have 1.8 million docs in our cluster and it took quite a while to get them in because of all the nested objects and we don't want to have to reprocess them if we can avoid it. Is there a way to add the field to the index and then dynamically update all length fields with a script call or something?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.