Elastic indexing performance

I am trying to Benchmark my ES (5.1.1) stand alone. I am pushing from 3 servers 5000 docs from each (rsyslog to kafka). Enabled 3 logstash instances to read from kafka to ES. But getting this

retrying failed action with response code: 429

I am monitoring my ES with kibana. There I am seeing my CPU as 50 to 60%. Heap is also fine. Everything is normal. Still why I am getting this error??

My ES default config changes:

Switched off the swap (sudo swapoff -a)
refresh interval: 30sec
replicas: 0

indices.memory.index_buffer_size: 30%
index.store.type: mmapfs
bootstrap.memory_lock: true

Anything else I am missing??

EDIT:: My sample doc

{
    "origmscaddr": "",
    "numberofdeliveryattempts": "0",
    "smscaddress": "919716099155",
    "originterface": "SMPP",
    "esmemastershortcode": "42543",
    "destinterface": "GSM",
    "lastfailurereason": "SMSC_no_error",
    "source": "/usr/CDR/Prepaid/load5000tuning/SMSCDR_PREPAY_160518060000_10.80.41.70_RS1.log",
    "esmesystemid": "ntms2",
    "messagesubmissiontime": "Wed May 18 06:11:01 2016",
    "callingnumber": "DD-Aircel",
    "bpartylrn": "",
    "messagedeliverystatus": "Delivered",
    "@timestamp": "2017-04-26T12:56:03.514Z",
    "callednumber": "919716437270",
    "apartyimsi": "",
    "destmscaddr": "919839000058",
    "recordtype": "Submit",
    "@version": "1",
    "messagedeliverytime": "Wed May 18 06:11:24 2016",
    "actualdatetime": "2016-May-18 06:11:24.235 +05:30"
  }

It is hard to say without looking at it, but there are usual bits of advice:

  1. HTTP 429 is the signal that you are pushing harder than some node can handle. If you backoff and retry everything should be fine. If you want to know why you can't handle that load then keep going.
  2. Check the logs for messages about throttling.
  3. Check the io statistics.
  4. Check for uneven load. If you have the default number of shards (5) then a three node cluster is going to end up uneven.
  5. Make sure you are importing using _bulk (logstash will do this for you so this should be noop)
  6. Make sure you mapping makes sense and doesn't have any really expensive things in it.
  7. Make sure the node stats reports the right number of CPUs.
  8. Have a look at the hot_threads API and make educated guesses about what it is doing.
PUT smsc-cdr-2016.05.23
{
  "settings": {
    "refresh_interval": "-1",
    "number_of_replicas": 0
  }, 
  "mappings": {
    "logs" : {
      "_all": {
        "enabled": false
      },
      "_source": {
        "enabled": true
      },
      "dynamic_templates": [
        {
          "integers": {
            "match_mapping_type": "long",
            "mapping": {
              "type": "keyword"
            }
          }
        },
        {
          "strings": {
            "match_mapping_type": "string",
            "mapping": {
              "type": "keyword",
              "ignore_above": 2056
            }
          }
        }
      ],
      "properties": {
        "actualdatetime" : {
          "type": "date",
          "format": "yyyy-MMM-dd HH:mm:ss.SSS Z"
        },
        "messagesubmissiontime":{
          "type": "date",
          "format": "EEE MMM dd HH:mm:ss yyyy"
        },
        "messagedeliverytime":{
          "type": "date",
          "format": "EEE MMM dd HH:mm:ss yyyy"
        }
      }
    }
  }
}

How to say my mapping is expensive or not??? I have three date fields in my mapping. Is that really expensive?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.