Getting Source data in aggregation results

pranav24 · May 26, 2020, 12:21pm

I have an index that consists of nested and normal fields.
The structure of my index is:

{
	"name" : "Walter white",
	"age" : "20",
	"email" : "walter.white@gmail.com",
	"subjects" :[
		{
			"subject_name" : "Computer Science",
			"marks" : "80"
		},
		{
			"subject_name" : "Maths",
			"marks" : "95"
		},
		{
			"subject_name" : "Physics",
			"marks" : "90"
		}
	]
}

Now, I want to create a report which contains all the data of the students group by their age.
I have created a query like this:

{
  "_source": false,
  "aggs": {
    "ageGroup": {
      "terms": {
        "field": "age"
      },
      "aggs": {
        "top_sales_hits": {
          "top_hits": {
            "size": 100
          }
        }
      }
    }
  }
}

I'm getting the desired result. But it is taking too much time to return the result.
Is there any other way to do the same?

pranav24 · May 28, 2020, 11:34am

Can anyone please help me on this?

Hendrik_Muhs · May 28, 2020, 12:46pm

Hi,

you could create a transform which indexes the results in another index. With pivot you can group_by age buckets as you described.

To access the source you can use a scripted metric aggregation, the following one would take every input document and store it in an array:

"all_docs": {
  "scripted_metric": {
    "init_script": "state.docs = []",
    "map_script": "state.docs.add(new HashMap(params['_source']))",
    "combine_script": "return state.docs",
    "reduce_script": "def docs = []; for (s in states) {for (d in s) { docs.add(d);}}return docs"
  }
}

This might not be what you want, but I hope I give you something to start with.

pranav24 · June 1, 2020, 12:51pm

Thanks @Hendrik_Muhs
This is my normal use case every user can do this multiple time a day with different field used for group by.
Is it feasible to create a index again and again for every request.

Hendrik_Muhs · June 3, 2020, 6:25am

You can create as many indexes as your cluster can hold, however I wonder if transform is the right choice if you do not reuse the output and if you are only interested in the result once. You would need to program against the API and manage the created transforms/indices in the background.

Your original concern was taking too much time to return the result. With Async search you can create the search async and pull the result later. This does not leave indices behind, however results must be retrieved during life-time.

pranav24 · June 3, 2020, 7:52am

Thank you @Hendrik_Muhs
I was not aware of the Async search feature of Elastic search. I will look into it.

Can you tell me is there any alternative to top_hits aggregation, to get source data in response from Elasticsearch?

system · July 1, 2020, 7:52am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Can i use _source field inside aggregations? Elasticsearch	2	2554	March 14, 2017
Return _source fields within aggregation Elasticsearch	4	9755	December 7, 2017
Scripted fields from the result of aggregations and the nested aggregations Elasticsearch	1	361	July 6, 2017
Multi Field Aggregation Elasticsearch	6	1028	July 6, 2017
Get aggregation value and source field in one st Elasticsearch	2	435	March 16, 2018

Getting Source data in aggregation results

Related topics