Dec 21, 2017: [EN][Elasticsearch] Knobs to turn for better indexing performance

Sherry_Ger · December 21, 2017, 11:00am

You have a high volume logging use case and have followed these recommendations:

What other knobs can you turn to improve indexing performance?

Set index.refresh_interval to 30s to 60s if near real time search is not a requirement. By default, this is set to 1s.
Increase indices.memory.index_buffer_size. It defaults to 10% of the total JVM heap allocated to a node that is to be used as the indexing buffer across all active shards.
Disable _field_names if exists query is not in use.

You can find more details here.

If your use case can tolerate increased risk of data loss in event of hardware failures, these options will push the write througput even further.

Boost index.translog.flush_threshold_size. Once the translog reaches the specified size, a flush will take place. Defaults 512mb.
Set index.translog.durability to async. This setting is risky as all acknowledged writes since the last commit will be discarded if a hardware failure should occur. Depending on the use case, this may be worth considering.

All except indices.memory.index_buffer_size are index level settings and dynamically configurable. Also, you can add them to an index template to make them defaults for all indices matching the index-patterns. For example,

PUT _template/logs
{
  "order": 0,
  "index_patterns": "logs-*",
  "settings": {
    "refresh_interval": "30s",
    "number_of_shards": "3",
    "translog": {
      "flush_threshold_size": "1gb",
      "durability": "async"
    },
    "unassigned": {
      "node_left": {
        "delayed_timeout": "5m"
      }
    },
    "query": {
      "default_field": "message"
    },
    "number_of_replicas": "1"
  },
  "mappings": {
    "doc": {
      "_field_names": {
        "enabled": false
      }
    }
  }
}

And what does the future hold for better indexing performance? In the up and coming Elasticsearch version 7, we are working towards an intelligent refresh, where we will skip the refresh on a shard that has not been searched on for a period of time (30s by default) and perform the refresh at the next scheduled interval if a search request should arrive for the shard.

Topic		Replies	Views
Pointers to Improve indexing performance? Elasticsearch	6	2941	February 28, 2017
ES performance tunning Elasticsearch	2	547	July 6, 2017
Indices Settings Elasticsearch	1	321	March 29, 2018
Indexing Buffer Settings per Index Elasticsearch	3	3008	July 5, 2017
Increasing Throughput Until My Servers Catch Fire Elasticsearch	20	734	September 11, 2020

Dec 21, 2017: [EN][Elasticsearch] Knobs to turn for better indexing performance

Related topics