Performance degradation after upgrading to ES 2.2.0


(Seva Feldman) #1

Hi Folks,

I've created new cluster based on ES 2.2.0, Logstash 2.2.0. Logstash input kafka and output elasticsearch.
Cluster built on top of AWS 10 data nodes r3.2xlarge with 3TB data_dir and 12 logstash indexers with local elasticsearch (no_data and no_master) on c3.large instances. All app server are producing logs to kafka and indexers are consuming them into ES. ES cluster indexing messages about 20% slower than app server are pushing them to kafka. Daily EC cluster receiving about 1.5 billion messages.
Data node load is about 30%, iowait is 15%, io util is about 30%-50%.
Same setup on ES 1.7 worked well.

Logstash conf:
** input {**
** kafka {**
** topic_id => "apache-accesslog"**
** zk_connect => "zookeeper1a:2181,zookeeper1b:2181,zookeeper1c:2181/kafka"**
** queue_size => 3000**
** group_id => "logstash_indexer_apache_accesslog"**
** }**
** }**
** filter {**
** if "metric" in [tags] or "_bad_clientip" in [tags] {**
** drop {}**
** }**
** }**
** output {**
** if "apache-accesslog" in [tags] {**
** elasticsearch {**
** hosts => ["localhost:9200"]**
** index => "apache-accesslog-%{+YYYY.MM.dd}"**
** flush_size => 3000**
** workers => 3**
** }**
** }**
** }**

Elasticsearch on logstash indexer node:

cluster.name: production.eu-west-1
node.name: logstash-i-95e52a1f.a.production.eu-west-1
node.max_local_storage_nodes: 1
path.conf: /etc/elasticsearch
path.data: /usr/share/elasticsearch
path.logs: /var/log/elasticsearch
network.host: eth0:ipv4,local
http.port: 9200
discovery.zen.ping.multicast.enabled: true
action.destructive_requires_name: true
cloud.node.auto_attributes: true
cloud.aws.access_key: somekey
cloud.aws.region: eu-west-1
cloud.aws.secret_key: somekey
cluster.routing.allocation.awareness.attributes: zone
cluster.routing.allocation.awareness.force.zone.values: spot,ondemand
discovery.ec2.any_group: true
discovery.ec2.availability_zones: eu-west-1a,eu-west-1b,eu-west-1c
discovery.ec2.host_type: private_ip
discovery.ec2.ping_timeout: 10s
discovery.ec2.tag.elasticsearch_cluster: production.eu-west-1
discovery.type: ec2
gateway.expected_nodes: 0
node.data: false
node.master: false
node.zone: spot

Logstash and ES are getting each 40% of the memory on the instance.

Kafka input are working very fast, about 6k events/s (used metrics filter to measure). However es output is very slow.

index mapping config:

{
** "order": 0,**
** "template": "apache-accesslog-*",**
** "settings": {**
** "index": {**
** "routing": {**
** "allocation": {**
** "total_shards_per_node": "4"**
** }**
** },**
** "cache": {**
** "field": {**
** "type": "soft"**
** },**
** "filter": {**
** "expire": "30m"**
** }**
** },**
** "refresh_interval": "30s",**
** "number_of_shards": "8",**
** "translog": {**
** "flush_threshold_ops": "20000"**
** },**
** "auto_expand_replicas": "false",**
** "query": {**
** "default_field": "uri"**
** },**
** "store": {**
** "throttle": {**
** "type": "node",**
** "max_bytes_per_sec": "100mb"**
** }**
** },**
** "number_of_replicas": "1"**
** }**
** },**
** "mappings": {**
** "default": {**
** "_source": {**
** "enabled": true**
** },**
** "dynamic_templates": [**
** {**
** "integers": {**
** "mapping": {**
** "index": "not_analyzed",**
** "type": "integer",**
** "doc_values": true**
** },**
** "match_mapping_type": "integer"**
** }**
** },**
** {**
** "strings": {**
** "mapping": {**
** "index": "not_analyzed",**
** "type": "string",**
** "doc_values": true**
** },**
** "match_mapping_type": "string"**
** }**
** }**
** ],**
** "_all": {**
** "enabled": false**
** }**

Changes I did during investigation was adding batch_size,workers to logstash process; workers and flush_size to output/ queue_size to input. Workers made it working faster, but than I started to get 429 error from logstash.

Any input or suggestions will be appreciated!


(system) #2