Slow bulk indexing

Hello

I'm having indexing performance problems. I'm using python for bulk operations
1000 documents per bulk tooks about 30 seconds
Documents quite small, about 15 fields most of which integers or short strings.
Indexing daemon runs almost on every ES node (in 5 to 15 threads), each deamon connects to local ES nods. Besides indexing deamon delete old records using bulk delete (1000 recores per bulk to)

Each index has from 300 millions to 1.5 billions records, devided on 10 shards (largest index with 1.5 billions records has 20 shards) and 1 replica.

My cluster has 24 nodes, 27 indexes 432 shards, 9 billions documents, 6.5TB of data
Elasticsearch version 2.2.0
Java: open-jdk (from 1.7.0_65 to 1.7.0_91)
Client library: elasticsearch 2.2.0 (latest)

Node details:
Ubuntu 12.04 or 14.04
CPU: Intel Xeon 8 cores
RAM: 32GB ( ES_HEAP_SIZE=16g on several nodes 20g)
SSD discs (2 discs per node, some of them in RAID1)

------------ index settings ---------------
{
"index": {
"creation_date": "1450432002298",
"number_of_replicas": "1",
"codec": "best_compression",
"uuid": "riSNQJY-R5K8McxgkbXbCg",
"ttl": {
"disable_purge": "true"
},
"analysis": {
"filter": {
"english_stemmer": {
"type": "stemmer",
"language": "english"
}
},
"analyzer": {
"english": {
"type": "custom",
"filter": [
"lowercase",
"english_morphology"
],
"tokenizer": "standard"
}
}
},
"number_of_shards": "10",
"refresh_interval": "30s",
"version": {
"created": "2010099"
}
}
}

---------- mapping ---------------
"positions": {
"_routing": {
"required": true
},
"_ttl": {
"enabled": true,
"default": 7776000000
},
"properties": {
"dynamic": {
"type": "short"
},
"position": {
"type": "short"
},
"region_queries_count_wide": {
"type": "integer"
},
"right_spell": {
"index": "no",
"doc_values": true,
"type": "string"
},
"keyword": {
"analyzer": "english",
"type": "string"
},
"keyword_id": {
"type": "integer"
},
"date": {
"format": "strict_date_optional_time||epoch_millis",
"type": "date"
},
"geo_names": {
"index": "not_analyzed",
"type": "string"
},
"cost": {
"type": "float"
},
"url": {
"index": "not_analyzed",
"type": "string"
},
"region_queries_count": {
"type": "integer"
},
"url_crc": {
"type": "long"
},
"subdomain": {
"index": "not_analyzed",
"type": "string"
},
"concurrency": {
"type": "short"
},
"domain": {
"index": "not_analyzed",
"type": "string"
},
"found_results": {
"type": "long"
},
"types": {
"index": "not_analyzed",
"type": "string"
}
},
"_all": {
"enabled": false
}
}

------------ elasticsearch.yml ------------
cluster.name: name
node.name: "es18"
node.master: false
node.data: true

path.data: /var/lib/elasticsearch,/home/elasticsearch
path.repo: ["/home/backupfs"]

http.port: 9200
http.host: "127.0.0.1"
network.bind_host: 0.0.0.0
network.publish_host: non_loopback:ipv4
transport.tcp.port: 9300
transport.tcp.compress: true
index.max_result_window: 60000
gateway.recover_after_nodes: 15
gateway.expected_nodes: 17
gateway.recover_after_time: 15m
bootstrap.mlockall: true
indices.recovery.max_bytes_per_sec: 150mb
indices.store.throttle.max_bytes_per_sec: 150mb
index.translog.flush_threshold_size: 500mb
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts:

  • es-gw1
  • es-gw2
  • es1
  • es2
  • es3

script.inline: on
script.indexed: on

threadpool.search.type: fixed
threadpool.search.size: 20
threadpool.search.queue_size: 100

threadpool.bulk.type: fixed
threadpool.bulk.size: 20
threadpool.bulk.queue_size: 300

threadpool.index.type: fixed
threadpool.index.size: 20
threadpool.index.queue_size: 100

indices.memory.index_buffer_size: 10%
indices.memory.min_shard_index_buffer_size: 12mb
indices.memory.min_index_buffer_size: 96mb

index.refresh_interval: 30s
index.translog.flush_threshold_ops: 50000

index.search.slowlog.threshold.query.warn: 10s
index.search.slowlog.threshold.query.info: 5s

index.search.slowlog.threshold.fetch.warn: 2s
index.search.slowlog.threshold.fetch.info: 1s

What should I do to increase indexing speed?

------- iostat output from several nodes -------
[es8] out: avg-cpu: %user %nice %system %iowait %steal %idle
[es8] out: 13.62 0.00 1.28 1.26 0.00 83.84

[es22] out: avg-cpu: %user %nice %system %iowait %steal %idle
[es22] out: 6.28 0.00 0.82 18.77 0.00 74.13

[es16] out: avg-cpu: %user %nice %system %iowait %steal %idle
[es16] out: 22.35 0.00 1.79 4.05 0.00 71.81

[es12] out: avg-cpu: %user %nice %system %iowait %steal %idle
[es12] out: 25.47 0.00 1.78 4.40 0.00 68.35

[es15] out: avg-cpu: %user %nice %system %iowait %steal %idle
[es15] out: 24.32 0.00 1.93 4.91 0.00 68.84

[es11] out: avg-cpu: %user %nice %system %iowait %steal %idle
[es11] out: 23.94 0.00 1.65 3.04 0.00 71.37

[es2] out: avg-cpu: %user %nice %system %iowait %steal %idle
[es2] out: 9.62 0.00 0.89 1.13 0.00 88.35

[es6] out: avg-cpu: %user %nice %system %iowait %steal %idle
[es6] out: 20.14 0.00 1.28 1.55 0.00 77.03

[es1] out: avg-cpu: %user %nice %system %iowait %steal %idle
[es1] out: 10.55 0.00 0.97 1.36 0.00 87.12

[es18] out: avg-cpu: %user %nice %system %iowait %steal %idle
[es18] out: 16.11 0.00 0.98 5.82 0.00 77.10

[es19] out: avg-cpu: %user %nice %system %iowait %steal %idle
[es19] out: 18.11 0.00 1.02 9.89 0.00 70.97

[es5] out: avg-cpu: %user %nice %system %iowait %steal %idle
[es5] out: 7.82 0.00 0.72 1.47 0.00 89.99

[es3] out: avg-cpu: %user %nice %system %iowait %steal %idle
[es3] out: 12.44 0.00 0.94 0.89 0.00 85.73

[es7] out: avg-cpu: %user %nice %system %iowait %steal %idle
[es7] out: 21.69 0.00 3.53 1.41 0.00 73.36

---------- load average -----------
[es14] out: 12:23:17 up 248 days, 18:54, 2 users, load average: 3.22, 2.95, 2.97
[es7] out: 12:22:37 up 314 days, 1:55, 3 users, load average: 3.07, 3.27, 3.24
[es12] out: 12:23:17 up 257 days, 2:25, 2 users, load average: 6.16, 5.66, 5.57
[es21] out: 11:23:17 up 110 days, 2:16, 2 users, load average: 2.11, 1.46, 1.09
[es20] out: 11:23:17 up 123 days, 59 min, 2 users, load average: 0.64, 0.47, 0.51
[es3] out: 12:23:17 up 462 days, 2:38, 3 users, load average: 1.21, 1.10, 1.11
[es16] out: 11:23:17 up 248 days, 1:59, 2 users, load average: 3.03, 3.86, 3.99
[es2] out: 12:23:17 up 530 days, 18:47, 2 users, load average: 0.83, 0.82, 0.95
[es5] out: 12:23:15 up 391 days, 1:47, 3 users, load average: 0.26, 0.39, 0.46
[es1] out: 12:23:17 up 487 days, 21:55, 2 users, load average: 1.59, 1.42, 1.31
[es8] out: 12:23:17 up 314 days, 2:04, 3 users, load average: 1.98, 1.66, 1.56
[es10] out: 12:23:17 up 391 days, 1:50, 2 users, load average: 0.18, 0.35, 0.39
[es11] out: 12:23:17 up 257 days, 2:19, 2 users, load average: 5.04, 4.67, 4.45
[es6] out: 12:23:17 up 337 days, 1:17, 2 users, load average: 0.65, 1.18, 1.32
[es13] out: 12:23:17 up 248 days, 19:11, 2 users, load average: 1.63, 1.67, 1.69
[es17] out: 12:23:17 up 248 days, 1:20, 2 users, load average: 6.46, 6.37, 6.51

Same issue here. When comparing the index latency between the elasticsearch 2.2 and 1.7.4, the 2.2 latency took more than 10x slower.

This parameter would help a lot in term of in 2.2 but you're at risk of loosing data too.

index.translog.durability: async

1 Like

Thank you for your reply

Now I'm trying to increase indices.memory.index_buffer_size
if it doesn't help I'll try this option