The concrete answer depends on which version of elasticsearch are you using. But in general, I would use has_child query with function score to calculate sum of all matched child records, then I would subtract money from it using another function score on the top level and exclude documents with negative scores using min_score parameter.
thanks for replying. In fact the calculation is more complex than the example I give. I have to collect the child records that match some conditions and calculate them in a "parent" query.
here is the actual example:
a people has serval events. each event has fields event_type, event_city and event_time. I want to find the people that do different type event in the same city and within one week
mapping may goes like this:
{
"people": {
"properties": {
"name": {"type": "keyword"}
}
},
"event_eat": {
"_parent": "people",
"properties": {
"time": {"type": "datetime"},
"city": {"type": "keyword"}
}
},
"event_drink": {
"_parent": "people",
"properties": {
"time": {"type": "datetime"},
"city": {"type": "keyword"}
}
}
}
the "time" records should be collected completely as an array in different has_child query, and in "parent" query I should calculate the two arrays of time by some time-sequence function
Sorry, I don't understand this requirement and I still don't know which version of ES you are using.
The main limitation here is that we don't have much space to store intermediate information and pass it between child and parent queries. It is basically limited to a single float value and just a few operations that we can do on it (max, min, sum, avg). If you can fit it into a float and use this operations - something can be done. Otherwise, you would have to retrieve this data and do it in your application, or split it into multiple queries.
The version of ES is 5.4.2, and your reply is helpful. My intention is to find a parent record in two steps. 1st, find the child records that match a query and retrieve some fields of them. Then calculate these fields to ensure whether the parent record could be recalled.
As you mentioned above, only a single float value can be passed between child and parent queries. So I can retrieve data from ES but that may be much more expensive because millions of records will be pulled out, loaded in memory in my application, and in the last pieced into a huge ids query to search parent records.
Are there any more suggestions to avoid calculating data outside ES?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.