Elasticsearch aggregation performance issue -- index allocation

sharon.c · June 15, 2016, 11:11pm

My company is using elasticsearch to transaction log indexing and aggregation. The data stream in elasticsearch from logstash at the speed of 1000-1500 messages per second. We currently have two data nodes to do both index and aggregation at the same time.

When aggregation task is big, we can see from marvel that elasticsearch node cpu load increases dramatically , and even indexing stops temporarily. Aggregation response time is long.

When we do aggregation without streaming data in elasticsearch (elasticsearch only has aggregation task not indexing task), then we can see the aggregation speed is much faster.

It looks like the indexing task and aggregation task has to be in different nodes to optimal performance.

Is that this a good solution if we allocate the old indices to the data node only in charge of aggregation, and new indices(indexing task is still going on) to the nodes only in charge of indexing?

spinscale · June 16, 2016, 8:11am

Hey,

the problem from the outside sounds less likely, that Elasticsearch cannot do aggregations while indexing, but rather that your nodes cannot handle the load of those two operations happen in parallel.

Before starting to optimize you should try to find out, where exactly the bottleneck is. When indexing is stopping, this could potentially be a garbage collection (check your logfiles for that) - which makes sense, if you aggregation creates a lot of buckets, then you require some memory for those. And CPU is required for an aggregation as well.

You could configure your indices (those you write and those you read) to be put on different nodes using shard allocation filtering.

--Alex

sharon.c · June 21, 2016, 6:30pm

Thanks, Alex
I used shard allocation filtering to set the ip address filtering like this
PUT test/_settings
{
"index.routing.allocation.include._ip": "192.168.2.*"
}
It works pretty well, it reallocated the shards accordingly to the specified nodes.

So I further looked into similar setting of node.box_type https://www.elastic.co/blog/hot-warm-architecture

PUT test/_settings
{
"index.routing.allocation.require.box_type" : "warm"
}

Elasticsearch response is {"acknowledged":true}, but it does not reallocate the index shards to the warm type nodes.
Looks like after I set "node.box_type: warm" in elasticsearch.yml, elasticsearch does not pick up the settings.

Does anyone have the same issue setting "warm" "hot" box_type ?

Christian_Dahlqvist · June 21, 2016, 6:44pm

Do you have replicas configured,meaning that both nodes all data? If this is the case, I do not believe shard allocation awareness is going to help much as you only have 2 data nodes and indexing need to be performed on both primary and replica shards.

sharon.c · June 21, 2016, 9:48pm

I do not have replicas. The index setting is like this:

curl -XPUT 'http://datanode2:9200/test' -d '{
"settings" : {
"index" : {
"number_of_shards" : 3 ,
"number_of_replicas" : 0
}
}
}'

Topic		Replies	Views
Indexing slowing down aggregations a lot Elasticsearch	5	814	December 12, 2018
Use case and infrastructure questions and doubts Elasticsearch	6	953	July 5, 2017
Separate Query Aggregation Nodes from Data Nodes Elasticsearch	5	788	March 22, 2019
Elasticsearch - shards not splitted equally Elasticsearch	10	4760	July 5, 2019
Performance Problems Elasticsearch user-experience	28	2205	February 26, 2024

Elasticsearch aggregation performance issue -- index allocation

Related topics