I have met a challenge that has made me review these forums and sign up :).
My work is related to cancer trials. We are currently on ES 5.5. We are trying to track numbers of trials that enter into different status over time. Each trial has an array of status history as shown below:
{
"status_history": [{
"comments": [],
"message_datetime": "2001-04-20T00:00:00",
"status": "ACTIVE",
"status_datetime": "2001-04-20T00:00:00"
}, {
"comments": [],
"message_datetime": "2009-08-12T00:00:00",
"status": "CLOSED_TO_ACCRUAL",
"status_datetime": "2009-08-12T00:00:00"
}, {
"comments": ["Trial completed prematurely."],
"message_datetime": "2010-04-01T00:00:00",
"status": "ADMINISTRATIVELY_COMPLETE",
"status_datetime": "2010-04-01T00:00:00"
}]
}
Such that when I am creating the histogram on status_history.status, I am expecting a single entry for for each occurrence in the qualifying date interval, but for each interval, because it aggregates the field for the trial its giving me 3 items each time instead of one for the relevant interval.
Query Example:
{
"date_histogram": {
"aggs": {
"status_history.status": {
"terms": {
"field": "status_history.status._raw",
"order": {
"_term": "asc"
},
"size": 50000
}
}
},
"date_histogram": {
"extended_bounds": {
"max": "2002-02-20T23:59:59.999999Z",
"min": "2001-01-01T00:00:00Z"
},
"field": "status_history.status_datetime",
"interval": "year",
"min_doc_count": 0
}
},
"status_history.status": {
"terms": {
"field": "status_history.status._raw",
"order": {
"_term": "asc"
},
"size": 50000
}
}
}
Returns:
[{
"doc_count": 1,
"key": 978307200000,
"key_as_string": "2001-01-01T00:00:00.000Z",
"status_history.status": {
"buckets": [{
"doc_count": 1,
"key": "ACTIVE"
}, {
"doc_count": 1,
"key": "ADMINISTRATIVELY_COMPLETE"
}, {
"doc_count": 1,
"key": "CLOSED_TO_ACCRUAL"
}],
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0
}
}, {
"doc_count": 0,
"key": 1009843200000,
"key_as_string": "2002-01-01T00:00:00.000Z",
"status_history.status": {
"buckets": [],
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0
}
}]
It should return something like:
[{
"doc_count": 1,
"key": 978307200000,
"key_as_string": "2001-01-01T00:00:00.000Z",
"status_history.status": {
"buckets": [{
"doc_count": 1,
"key": "ACTIVE"
},],
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0
}
}, {
"doc_count": 0,
"key": 1009843200000,
"key_as_string": "2002-01-01T00:00:00.000Z",
"status_history.status": {
"buckets": [],
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0
}
}]
What else do I need to filter out the unqualifying dates per date interval?
I hope someone can help.
Thanks,
Peter