Query and return aggs group by terms

Hello,

A minimal data example:

...
References: [
{
DocType: { SysName: "order"}
DocId: "12345"
},
{
"DocType: {SysName: "invoice"}
"DocId: "22222"}
]
MessageDate: "05/05/20"
...

I need to query this data, scan the references and return all the distinct doc Ids of type order while supporting paging and sorting if possible.

I am trying to figure out how to query this.
So far I tried something like this :

{
   "aggs":{
      "docType":{
         "terms":{
            "field":"References.DocType.SysName.keyword"
         },
         "aggs": {
			"docId": {
	         	"terms": {
         		"field": "References.DocId.keyword",
         		"size": 10000
         		}
			}
         }
      }
   },
   "query":{
      "bool":{
	         "filter":[
            {
               "range":{
                  "MessageDate":{
                     "gte":"2015-05-24T00:00:00+03:00"
                  }
               }
            }
         ]
      }
   }
}

The response is :

"buckets": [
                {
                    "key": "order",
                    "doc_count": ...,
                    "docId": {
                        "doc_count_error_upper_bound": 0,
                        "sum_other_doc_count": ...,
                        "buckets": [
                                {
                                    "key": "12345",
                                    "doc_count": 1
                                }]
...

This is sort of gives me what I need but I think I it is the wrong direction.
Because I also need to support paging probably with the - from, size parameters.
So Ideally I would have wanted to somehow return this information in the _source, hits body.

References.DocId.keyword : 22222 in the results is not of type order. Is that OK?
You will probably need to use nested field mappings and aggregations to solve that confusion.

Fixed

I am trying to use something like -

"aggs": {
          			"top_docId": {
         			"top_hits": {
         				"size": 100,
         				"_source": { "includes": ["References.DocId", "References.DocType.SysName"] }
         			}
         		}

This is very close to what I need,
But I am also facing a problem with this approach.
I want to somehow make sure that the repsonse is going to include distinct order Doc Ids.

Is there a way to modify the response so instead of something like this :

"hits": [
                    {
                        "_index": "...",
                        "_type": "_doc",
                        "_id": "...",
                        "_score": 0.0,
                        "_source": {
                            "References": [
                                {
                                    "DocId": "...",
                                    "DocType": {
                                        "SysName": "order"
                                    }
                                },
                                {
                                    "DocId": "...",
                                    "DocType": {
                                        "SysName": "shipment"
                                    }
                                }
                            ]
                        }...

I will get a flattened response like this :

[
{
                                        "DocId": "...",
                                        "DocType": {
                                            "SysName": "order"
                                        }...
]

So basically without the header data, and if possibe without the other doc types data.
I think if I will be able to receive this kind of a response I will also get the distinct option for free?

Are you using nested docs?

I managed to solve this on my service logic.
Thanks for the replay!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.