Hi, I am trying to build a query that will match when an value of a field is greater than another nested field, but am stuck as to where to begin. Can anyone help or suggest somewhere to begin reading. Example data:
If it were possible it would probably take the form of some custom script retrieving the value pairs and comparing them for every doc. This would be linear to the number of docs and sounds expensive so perhaps it's worth considering another approach. Perhaps you could compute this "on the way in" using an ingest-pipeline to compute the difference between the amounts. You could then query and aggregate on this numeric "diff" value held in the index much more efficiently.
That is an idea, simple enough to build a simple string or array with an order and then just additionally match that at query time.
Do you think something similar to this would be efficient?
Where I use the principal of the 'smooshed_arrays' to create com1_com2. the reality of my searches are likely to be 2 or 3 objects to work with, so can easily build these on the way in for example, A_B and A_B_C / A_C_B (when 3 items to query)
Where I use the principal of the 'smooshed_arrays' to create com1_com2. the reality of my searches are
I'm not sure what you intend these arrays to represent in this example.
If they are your name fields, sorted by their amount then, yes that or a concatenated com1_com2 type string might help you find records more easily. I'm not clear on these points:
how many com* type names can exist in each doc?
are you only interested in com1 and com2 or testing arbitrary choices of com* names?
do you only care about pairs or maybe sequences of more than 2 names when sorted by amount?
Maybe sharing the actual business problem you're trying to solve would help?
Sorry yes, the com* are names and the amounts are not necessary for this section of the query, it is purely com1 value is greater than com2 so strings are fine. To answer your questions
1- There can be 5-6 com*, but when above 3 this ordering of amounts becomes a useless query and the results are not really worth it.
2- yes only interested in the names. the amounts are not really of value, it's just com1 is greater than com2.
yes it is only the sequence. Just had a thought that I can add a numerical value for the index position in the sequence and then query that, unless there is a way of assigning this on ingest?
The problem is amount of ingredients in cocktails, so the user can say they want a cocktail with more gin than vodka
The applications of elasticsearch never cease to amaze me
You might want to take a look at SpanQuery family of queries - specifically the SpanNear query which can help you test the position and order of elements.
Crazy right? Come up with something amazing like ES and we'll use it for almost everything we can possibly conceive, including something as trivial as wanting more gin!!!!!
In terms of mappings required to support span queries - you'll obviously need to ensure ingredients are sorted by amount. As for how you pass these values
a single JSON string value would require a delimiter in there e.g. commas and a choice of Analyzer that chops into tokens based on the delimiter OR
a JSON array of the ingredients where you might want to dial back the position_increment to 1 to make the ingredient terms appear closer to each other and so wouldn't need big "slop" values in your SpanNear query
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.