Filtering nested aggregates


(Ary Borenszweig) #1

Hi,

I have an index where I need to store medical test results. A test result
can talk about many conditions and their results: for example, Tuberculosis
=> positive, Flu => negative. So I modeled my index like this:

curl -XPUT "http://localhost:9200/test_results/" -d'
{
"mappings": {
"result": {
"properties": {
"data": {
"type": "nested",
"properties": {
"condition": {"type": "string"},
"result": {"type": "string"}
}
}
}
}
}
}'

I insert one test result with Tuberculosis => positive, Flu => negative:

curl -XPOST "http://localhost:9200/test_results/_bulk" -d'
{"index":{"_index":"test_results","_type":"result"}}
{"data": [{"condition": "Tuberculosis", "result": "positive"},
{"condition": "FLU", "result": "negative"}]}
'

Then, one of the queries I need to do is this one: for Tuberculosis, give
me how many positives you have and how many negatives you have (basically:
filter by data.condition and group by data.result). So I tried this query:

curl -XPOST "http://localhost:9200/test_results/_search?pretty=true" -d'{
"size": 0,
"query": {
"nested": {
"path": "data",
"query": {
"match": {
"data.condition": "Tuberculosis"
}
}
}
},
"aggregations": {
"data": {
"nested": {
"path": "data"
},
"aggregations": {
"result": {
"terms": {
"field": "data.result"
}
}
}
}
}
}
'

However, the above gives me this result:

"aggregations" : {
"data" : {
"doc_count" : 2,
"result" : {
"buckets" : [ {
"key" : "negative",
"doc_count" : 1
}, {
"key" : "positive",
"doc_count" : 1
} ]
}
}
}

That is, it gives me one negative result and one positive result. That's
because the document has one positive and negative, and it's not discarding
the one that has "Flu".

I see in the documentation there's a "filter" aggregate. I tried using it
in many ways:

  1. With term on "data.condition":

curl -XPOST "http://localhost:9200/test_results/_search?pretty=true" -d'{
"size": 0,
"query": {
"nested": {
"path": "data",
"query": {
"match": {
"data.condition": "Tuberculosis"
}
}
}
},
"aggregations": {
"data": {
"nested": {
"path": "data"
},
"aggregations": {
"filtered_result": {
"filter": {
"term": { "data.condition" : "Tuberculosis" }
},
"aggregations" : {
"result": {
"terms": {
"field": "data.result"
}
}
}
}
}
}
}
}
'

  1. With term on "condition":

curl -XPOST "http://localhost:9200/test_results/_search?pretty=true" -d'{
"size": 0,
"query": {
"nested": {
"path": "data",
"query": {
"match": {
"data.condition": "Tuberculosis"
}
}
}
},
"aggregations": {
"data": {
"nested": {
"path": "data"
},
"aggregations": {
"filtered_result": {
"filter": {
"term": { "condition" : "Tuberculosis" }
},
"aggregations" : {
"result": {
"terms": {
"field": "data.result"
}
}
}
}
}
}
}
}
'

  1. With nested:

curl -XPOST "http://localhost:9200/test_results/_search?pretty=true" -d'{
"size": 0,
"query": {
"nested": {
"path": "data",
"query": {
"match": {
"data.condition": "Tuberculosis"
}
}
}
},
"aggregations": {
"data": {
"nested": {
"path": "data"
},
"aggregations": {
"filtered_result": {
"filter": {
"nested": {
"path": "data",
"filter": {
"term": { "data.condition": "Tuberculosis" }
}
}
},
"aggregations" : {
"result": {
"terms": {
"field": "data.result"
}
}
}
}
}
}
}
}
'

but no luck: all of the above queries just give me:

"aggregations" : {
"data" : {
"doc_count" : 2,
"filtered_result" : {
"doc_count" : 0,
"result" : {
"buckets" : [ ]
}
}
}
}

Is there a way to do what I want?

Thanks,
Ary

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f5094888-6654-4d40-bf9b-d81ec1e5add4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Ary Borenszweig) #2

A friend of mine made it work. It wasn't working because we were using a
filter -> term inside the nested aggregation with "Tuberculosis", but the
analyzed value was "tuberculosis". Changing "Tuberculosis" to
"tuberculosis" made it work. Also, repeating the first query (instead of
using a filter) makes it work in the nested filter.

Here's one example:

curl -XPOST "http://localhost:9200/test_results/_search?pretty=true" -d'{
"size": 0,
"query": {
"nested": {
"path": "data",
"query": {
"match": {
"data.condition": "Tuberculosis"
}
}
}
},
"aggregations": {
"data": {
"nested": {
"path": "data"
},
"aggregations": {
"filtered_result": {
"filter": {
"query": {
"match": {
"condition": "Tuberculosis"
}
}
},
"aggregations" : {
"result": {
"terms": {
"field": "data.result"
}
}
}
}
}
}
}
}
'

On Friday, May 9, 2014 2:48:50 PM UTC-3, Ary Borenszweig wrote:

Hi,

I have an index where I need to store medical test results. A test result
can talk about many conditions and their results: for example, Tuberculosis
=> positive, Flu => negative. So I modeled my index like this:

curl -XPUT "http://localhost:9200/test_results/" -d'
{
"mappings": {
"result": {
"properties": {
"data": {
"type": "nested",
"properties": {
"condition": {"type": "string"},
"result": {"type": "string"}
}
}
}
}
}
}'

I insert one test result with Tuberculosis => positive, Flu => negative:

curl -XPOST "http://localhost:9200/test_results/_bulk" -d'
{"index":{"_index":"test_results","_type":"result"}}
{"data": [{"condition": "Tuberculosis", "result": "positive"},
{"condition": "FLU", "result": "negative"}]}
'

Then, one of the queries I need to do is this one: for Tuberculosis, give
me how many positives you have and how many negatives you have (basically:
filter by data.condition and group by data.result). So I tried this query:

curl -XPOST "http://localhost:9200/test_results/_search?pretty=true" -d'{
"size": 0,
"query": {
"nested": {
"path": "data",
"query": {
"match": {
"data.condition": "Tuberculosis"
}
}
}
},
"aggregations": {
"data": {
"nested": {
"path": "data"
},
"aggregations": {
"result": {
"terms": {
"field": "data.result"
}
}
}
}
}
}
'

However, the above gives me this result:

"aggregations" : {
"data" : {
"doc_count" : 2,
"result" : {
"buckets" : [ {
"key" : "negative",
"doc_count" : 1
}, {
"key" : "positive",
"doc_count" : 1
} ]
}
}
}

That is, it gives me one negative result and one positive result. That's
because the document has one positive and negative, and it's not discarding
the one that has "Flu".

I see in the documentation there's a "filter" aggregate. I tried using it
in many ways:

  1. With term on "data.condition":

curl -XPOST "http://localhost:9200/test_results/_search?pretty=true" -d'{
"size": 0,
"query": {
"nested": {
"path": "data",
"query": {
"match": {
"data.condition": "Tuberculosis"
}
}
}
},
"aggregations": {
"data": {
"nested": {
"path": "data"
},
"aggregations": {
"filtered_result": {
"filter": {
"term": { "data.condition" : "Tuberculosis" }
},
"aggregations" : {
"result": {
"terms": {
"field": "data.result"
}
}
}
}
}
}
}
}
'

  1. With term on "condition":

curl -XPOST "http://localhost:9200/test_results/_search?pretty=true" -d'{
"size": 0,
"query": {
"nested": {
"path": "data",
"query": {
"match": {
"data.condition": "Tuberculosis"
}
}
}
},
"aggregations": {
"data": {
"nested": {
"path": "data"
},
"aggregations": {
"filtered_result": {
"filter": {
"term": { "condition" : "Tuberculosis" }
},
"aggregations" : {
"result": {
"terms": {
"field": "data.result"
}
}
}
}
}
}
}
}
'

  1. With nested:

curl -XPOST "http://localhost:9200/test_results/_search?pretty=true" -d'{
"size": 0,
"query": {
"nested": {
"path": "data",
"query": {
"match": {
"data.condition": "Tuberculosis"
}
}
}
},
"aggregations": {
"data": {
"nested": {
"path": "data"
},
"aggregations": {
"filtered_result": {
"filter": {
"nested": {
"path": "data",
"filter": {
"term": { "data.condition": "Tuberculosis" }
}
}
},
"aggregations" : {
"result": {
"terms": {
"field": "data.result"
}
}
}
}
}
}
}
}
'

but no luck: all of the above queries just give me:

"aggregations" : {
"data" : {
"doc_count" : 2,
"filtered_result" : {
"doc_count" : 0,
"result" : {
"buckets" : [ ]
}
}
}
}

Is there a way to do what I want?

Thanks,
Ary

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/19846a1d-ea82-4097-859c-696591df1558%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3