Elasticsearch Aggregations taking a long time

photonic_world_2 · January 21, 2016, 11:27pm

I understand this is the most sought after topic in elasticsearch, I see lot of answers but haven't found anything convincing. Here is the problem:

I have a monthly index of 5 primary shards and 1 replica for each on 5 data nodes.

Hardware:
8 CPUs, 32 G RAM and 16G of heap. The field data circuit breaker is set at 30% and indices.breaker.total.limit is at 70%.

Number of documents on these indices are around ~100 mil. Each of these documents are about 150k in size. All of the fields are keyword analyzed.

A simple term aggregation on one of the fields takes around 60s, this grows with data in the index. I further reduced the set on which aggregations happen by using filter aggregation here is my query

What I do not understand is that running this query with just the filter aggregation filter_agg takes ~ 1s and returns 157 documents adding the term_aggregate causes the aggregate query to take more than 100s.

Am I missing something here, is there something wrong with the query?
Does the term_aggregate aggregate 157 documents which resulted from filter_agg?

GET /index-1-2016/type/_search?search_type=count
{
"aggs": {
"filter_agg": {
"filter": {
"bool": {
"must": [
{
"term": {
"search_field1": "field1",
"_cache": true
}
},
{
"term": {
"search_field2": "field2",
"_cache": true
}
}
]
}
},
"aggs": {
"term_aggregate": {
"terms": {
"field": "emails",
"size": 5,
"shard_size": 50
}
}
}
}
}
}

jimczi · January 22, 2016, 9:48am

A simple term aggregation on one of the fields takes around 60s

I think you should start from here. The first query with a term aggregation on a "keyword" analyzed field takes time. Each shard needs to populate the fielddata for this particular field. 60s seems quite long, what do you mean by "keyword" analyzed ? You used the keyword analyzer in the definition of the field ?
What is the content of your field, is it big ?
What is the response time if you run the query several times ?

photonic_world_2 · January 22, 2016, 6:20pm

My mapping for the field looks like this:
"analysis":{
"analyzer":{
"lowercase_keyword_analyzer":{
"type":"custom",
"tokenizer":"keyword",
"filter":[
"lowercase"
]
}
}
}

...
"mappings":{
"type":{
"properties":{
...
"emails": {
"type": "string",
"analyzer": "lowercase_keyword_analyzer"
}
...
}
}
}

It is just and array of emails. Does it being an array matter?

Response time appears to be the same on an average. Doesn't decrease with subsequent invocations.

jimczi · January 25, 2016, 9:31am

Ok thank you for the clarifications. Why are you using a keyword tokenizer ? Are you trying to find duplication in the mails ? The keyword tokenizer "tokenizes" an entire stream as a single token, this means that each mail in your aggregation counts for one entry. I suspect that the size of those tokens is problematic and is the reason why it's taking so much time. Can you describe your use case ?

photonic_world_2 · January 25, 2016, 6:10pm

The field contains array of email ids. I want them to be searchable as well. Do you think adding another field and making it a multi-field with one not_analyzed would improve performance?

Topic		Replies	Views
Slow aggregation no matter the size of the result set Elasticsearch	3	483	October 26, 2018
Elasticsearch Aggregation time Elasticsearch	6	383	July 6, 2017
Elastic Search Aggregations Slow Elasticsearch	21	2706	November 26, 2021
Aggregations taking way too long? Elasticsearch	7	318	May 24, 2022
Slow searches on a cluster Elasticsearch	3	878	July 5, 2017

Elasticsearch Aggregations taking a long time

Related topics