Getting Source data in aggregation results

I have an index that consists of nested and normal fields.
The structure of my index is:

{
	"name" : "Walter white",
	"age" : "20",
	"email" : "walter.white@gmail.com",
	"subjects" :[
		{
			"subject_name" : "Computer Science",
			"marks" : "80"
		},
		{
			"subject_name" : "Maths",
			"marks" : "95"
		},
		{
			"subject_name" : "Physics",
			"marks" : "90"
		}
	]
}

Now, I want to create a report which contains all the data of the students group by their age.
I have created a query like this:

{
  "_source": false,
  "aggs": {
    "ageGroup": {
      "terms": {
        "field": "age"
      },
      "aggs": {
        "top_sales_hits": {
          "top_hits": {
            "size": 100
          }
        }
      }
    }
  }
}

I'm getting the desired result. But it is taking too much time to return the result.
Is there any other way to do the same?

Can anyone please help me on this?

Hi,

you could create a transform which indexes the results in another index. With pivot you can group_by age buckets as you described.

To access the source you can use a scripted metric aggregation, the following one would take every input document and store it in an array:

"all_docs": {
  "scripted_metric": {
    "init_script": "state.docs = []",
    "map_script": "state.docs.add(new HashMap(params['_source']))",
    "combine_script": "return state.docs",
    "reduce_script": "def docs = []; for (s in states) {for (d in s) { docs.add(d);}}return docs"
  }
}

This might not be what you want, but I hope I give you something to start with.

Thanks @Hendrik_Muhs
This is my normal use case every user can do this multiple time a day with different field used for group by.
Is it feasible to create a index again and again for every request.

You can create as many indexes as your cluster can hold, however I wonder if transform is the right choice if you do not reuse the output and if you are only interested in the result once. You would need to program against the API and manage the created transforms/indices in the background.

Your original concern was taking too much time to return the result. With Async search you can create the search async and pull the result later. This does not leave indices behind, however results must be retrieved during life-time.

Thank you @Hendrik_Muhs
I was not aware of the Async search feature of Elastic search. I will look into it.

Can you tell me is there any alternative to top_hits aggregation, to get source data in response from Elasticsearch?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.