Getting all documents without duplicate values in Field

I have an index that looks similar to the below (it really has a number of additional fields that aren't relevant for this post)

case_id: int
group: string
unique_id: case_id + group
value: int

for any one case_id value I might have several documents, each with a different group. Each of these documents should have the same value in the "Value" field. Different case_ids might also go through different sets of groups

For Example

case_id: 1
group: A
value: 3

case_id: 1
group: B
value: 3

case_id: 1
group: C
value: 3

case_id: 2
group: A
value: 1

case_id: 2
group: B
value: 1

case_id: 3
group: D
value: 5

I would like to Average the Value of each case_id. But I would only like to calculate the value of each case_id once. So for example with the above data set I want
case_id 1 = 3
case_id 2 = 1
case_id 3 = 5

(3 + 1 + 5) / 3 = 3. So I should get 3 as a result.

The problem is I can't just average all case_id values of all the documents or I would get something like
(3 + 3 + 3 + 1 + 1 + 5) ~= 2.6667

How can I first subset a group to just one document per case_id (doesn't matter which document) and then average the value fields?

Thanks for any help!

hey,

you can try a terms aggregation (I think histogram might work as well) on the case_id field and then use a subaggregation to calculate the avg.

--Alex

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.