Agrregation queries taking long

Hitesh_Chavhan · October 21, 2016, 7:08am

Hi
I have the below aggregation queries which aggregates all the employees based on their score.
This is a weekly score given to each employee. Means there are such 3032561 records in ES each having a list of dict for each employee.

I am querying ES using node.js and the query is not returning data as its getting time out.
Someone told me that ES not able to perform agg on this amount of data, I don't think that's the case. Please help me out.

below is the query.
{
"query": {
"bool": {
"must": []
}
},
"aggs": {
"emp": {
"nested": {
"path": "emp"
},
"aggs": {
"scores": {
"terms": {
"field": "emp.emp_name.case_sensitive",
"size": 0,
"order": {
"total_score": "desc"
}
},
"aggs": {
"total_score": {
"sum": {
"field": "emp.score"
}
}
}
}
}
}
}
}

Mark_Harwood · October 21, 2016, 8:16am

What are the root level docs? I can see you are using nested in the agg so presumably each root doc can have more than one employee. How many employees are there per root doc?

Have you done an estimate of how big the JSON response would be ?

Hitesh_Chavhan · October 21, 2016, 9:28am

The root level is performance doc, which indeed contains emp as list of dictionary having more than 1 employee.
I guess there would be more than 8 employees per root doc.
Below is the format of same.
{
"date":"12/12/2009",
"record_id":1003
"emp":[{ "emp_name":"Robert Madis","score":10},
{"emp_name":''Piras jicking","score":12}
]
}

I had not done an estimation of JSON response. But considering the data It should not be that big.

Mark_Harwood · October 21, 2016, 9:43am

Let's work it out:

Theoretically it's possible the 3m+ performance docs could all refer to the same 8 employees so the final result could be 8 only employees but somehow I guess that's highly unlikely otherwise no one would get any work done due to constant performance reviews.
Let's assume all employees are unique and require 50 bytes to return names and scores.

This gives us 3032561 x 8 x 50 = 1.1GB of JSON.

That's a lot of JSON data to create and serialize so I I'm not surprised it takes a while.

Hitesh_Chavhan · October 21, 2016, 3:05pm

So As I understand , with larger data sizes and aggregations taking place serialization and creating would take time.

The overall aggregation is working faster but creation of JSON and Serialization process consumes much of the resultant time.