Elastic search AWS EC2 cluster indexing performance is decreased compared to single node performance


(balaa.gs) #1

I got the good indexing performance (1300+ doc/sec) in single node. but in cluster of 3 nodes, getting less performance (800-900 doc/sec).

ES cluster deployed in AWS EC2 Linux servers with 15GB RAM. Heap size set to 5GB in each node.

elasticsearch.yml file configuration in below for 3 nodes

cluster.name:   NSES
node.name:  satellite1/satellite2/satellite3
node.master:    TRUE/TRUE/TRUE
node.data:  FALSE/TRUE/TRUE
bootstrap.mlockall:     TRUE/TRUE/TRUE
discovery.zen.minimum_master_nodes:     1/1/1
discovery.zen.ping.timeout:     30s
discovery.zen.ping.unicast.hosts:   ["127.0.0.1:[9200-9300],[Node2 public IP]:[9200-9300],[Node3 public IP]:[9200-9300]"] #vice versa for other nodes
cloud.aws.access_key:   [given same value for 3 nodes]
cloud.aws.secret_key:   [given same value for 3 nodes]
discovery.type:     "ec2"
discovery.ec2.groups:   [given same value for 3 nodes]
discovery.ec2.host_type:    public_ip
discovery.ec2.ping_timeout:     30s
discovery.ec2.availability_zones:   ap-northeast-1c
cloud.aws.region:   ap-northeast
discovery.zen.ping.multicast.enabled:   FALSE
network.publish_host:   [current machine public ip]
plugin.mandatory:   cloud-aws
marvel.agent.enabled:   TRUE

Is there any YML configuration has to change/add. OR is there any ES configuration changes has to do.

My expected document count will be 2500+ docs/sec for 3 nodes.


(Christian Dahlqvist) #2

You only have 2 data nodes, so in effect it is 2 and not 3 nodes that are performing the indexing. Assuming that you have one replica configured, all data will also be indexed on each of the data nodes, and if this is the case I would not expect the throughput to increase much compared to a single node. The fact that your indexing rate actually goes down could be due to the added replication of data between the nodes. Are you using bulk indexing and sending the requests directly to both data nodes in parallel? How many CPU cores do you have and what type of storage are you using?


(Mark Walkom) #3

Also look at increasing your heap to 50% of total system RAM.


(balaa.gs) #4

i am index through bulk api. also sending request from 1 node. also checked in 2 node. there is no much difference.

find the bulk api sample code below. if any changes i need to do for cluster upload:

my $e = Search::Elasticsearch->new();

#below code will run for file loop in a folder
$action = {index => {_index => 'my_index', _type => 'blog_post', _id => $ifileid}};
$doc = {filename => $file, content => 'bala'};
push @docs, $action;
push @docs,$doc
if ($ibulkid==100)
{
# bulk index docs
my $res = $e->bulk(body => @docs);
if ( $res->{errors} )
{
die "Bulk index had issues: " . $json->encode( $res->{errors} );
}
$ibulkid=0;
}
$ibulkid++;
}
$ifileid++;

No of core: 2 physical, 4 logical processors. it is a x.large instance in AWS EC2.

ES configuration while creating the index.

curl -XPUT "http://localhost:9200/my_index" -d'
{
"number_of_shards": 10,
"number_of_replicas": 1,
"index.refresh_interval": "-1",
"index.translog.flush_threshold_period": "600",
"index.translog.flush_threshold_ops": "50000",
"index.engine.robin.refresh_interval": "30",
"index.merge.policy.use_compound_file": "false",
"indices.memory.index_buffer_size": "40%",
"index.warmer.enabled": "false",
"indices.store.throttle.max_bytes_per_sec": "200MB",
"index.store.throttle.type": "merge",
"index.store.type": "mmapfs",
"index.norms": "true",
"index.cache.field.type": "soft",
"bootstrap.mlockall": "true",
"index.merge.policy.merge_factor": "500",
"index.fielddata.cache": "soft",
"index.fielddata.cache.size": "20"
}'

My direct question is
1. Normally cluster will increase the indexing speed or search speed?
2. is there any possibility to increase the indexing speed by increasing the node or by any configuration?.
3. already heap size also increased to 5GB.


(Mark Walkom) #5

If you have refresh turned off, disable replicas too. That'll improve performance.


(system) #6