The bulk API becomes much slower once I have indexed 300,000 documents into elasticsearch?

I have setup an elasticsearch cluster on the AWS, the cluster has 2 shards each with 1 replicate, so I have 4 VMs in total for the cluster. My goal is to use the mongo-connector to dump 1 million documents from my mongodb on other VM to the elasticsearc cluster. The mongo-connector uses the elasticsearch python client, it uses the bulk API to indexes 500 documents for each call. During the test, I found an interesting issue :

The first 600 bulk requests take 0.25~ second each time, however, starting from the 601th API call, each bulk request takes 1.25~ seconds, it suddenly takes 1 more second. So the first 300,000(600*500) documents have been indexed to elasticsearch very fast, the rest of docs take much longer than the first 300,000 docs.

Below are the logs of the mongo-connector :

2015-09-24 09:06:00,369 [INFO] mongo_connector.connector:1030 - Beginning Mongo Connector
2015-09-24 09:06:00,369 [WARNING] mongo_connector.connector:116 - MongoConnector: Can't find oplog.timestamp, attempting to create an empty progress log
2015-09-24 09:06:00,374 [INFO] mongo_connector.connector:223 - MongoConnector: Empty oplog progress file.
2015-09-24 09:06:00,377 [INFO] mongo_connector.oplog_manager:86 - OplogThread: Initializing oplog thread
2015-09-24 09:06:00,392 [INFO] mongo_connector.connector:295 - MongoConnector: Starting connection thread MongoClient('localhost', 27017)
2015-09-24 09:06:02,963 [INFO] urllib3.connectionpool:205 - Starting new HTTP connection (1): 172.31.1.254
2015-09-24 09:06:03,387 [INFO] elasticsearch:64 - POST http://172.31.1.254:9200/_bulk?timeout=60&refresh=false [status:200 request:0.259s]
2015-09-24 09:06:06,078 [INFO] elasticsearch:64 - POST http://172.31.1.254:9200/_bulk?timeout=60&refresh=false [status:200 request:0.233s]
2015-09-24 09:06:07,719 [INFO] elasticsearch:64 - POST http://172.31.1.254:9200/_bulk?timeout=60&refresh=false [status:200 request:0.244s]
2015-09-24 09:06:11,288 [INFO] elasticsearch:64 - POST http://172.31.1.254:9200/_bulk?timeout=60&refresh=false [status:200 request:0.228s]
..........594 lines omitted
2015-09-24 09:29:12,213 [INFO] elasticsearch:64 - POST http://172.31.1.254:9200/_bulk?timeout=60&refresh=false [status:200 request:0.232s]
2015-09-24 09:29:13,688 [INFO] elasticsearch:64 - POST http://172.31.1.254:9200/_bulk?timeout=60&refresh=false [status:200 request:0.260s] the 600th bulk call
2015-09-24 09:29:25,096 [INFO] elasticsearch:64 - POST http://172.31.1.254:9200/_bulk?timeout=60&refresh=false [status:200 request:1.211s] the 601th bulk call
2015-09-24 09:29:36,250 [INFO] elasticsearch:64 - POST http://172.31.1.254:9200/_bulk?timeout=60&refresh=false [status:200 request:1.244s]
2015-09-24 09:29:46,413 [INFO] elasticsearch:64 - POST http://172.31.1.254:9200/_bulk?timeout=60&refresh=false [status:200 request:1.211s]
2015-09-24 09:29:56,851 [INFO] elasticsearch:64 - POST http://172.31.1.254:9200/_bulk?timeout=60&refresh=false [status:200 request:1.259s]
......... more logs ....

As we can see from the log, for the first 600 requests, the log time is around 1 second after the previous log, the actual server process time is around 0.25 second. Starting from the 601th, the log time suddenly become 10~ seconds for each request, and the server process time becomes 1.25 seconds.

What would be the problem? Can anyone help?

First look at ES log files on data nodes. It's possible that you faced segment merges which are slows inserts.

Thanks Rusty, it seems segment merge is not the cause for I have disabled the merge using the following settings as per the guide

GET _cluster/settings?pretty
{
  "persistent" : { },
  "transient" : {
    "indices" : {
      "store" : {
        "throttle" : {
          "type" : "none"
        }
      }
    }
  }
}

Can anyone else please help?

It's don't disable merging but disables the throttling of merge process.
Another point is to change bulk size, just try different sizes like 10k, 5k and figure out which is the best for your data/hardware.

The best thing to do for bulk inserts I've found is to set the refresh
interval to something very high: 30 seconds or -1. That will make bigger
segments and skip some amount of merging. And skip lots of other overhead
too.

Nik

Thanks Rusty and nik9000.

Yes, I have set the refresh interval to -1, and set the number of replicas to 0, also set the throttle type to none, after that, I watch the index/_stats, I am sure the segment merge, refresh and replicas are all disabled. the time elapsed is now better than before, but the time elapsed from the 601th request is still much longer than the first 600 requests. The first 600 bulk requests is now 0.1~ second, the rest bulk requests are 0.6~ second.

I am not sure what the elasticsearch server do after the first 600 bulk requests.

cluster settings:

{
  "persistent" : { },
  "transient" : {
    "indices" : {
      "store" : {
        "throttle" : {
          "type" : "none",
          "max_bytes_per_sec" : "100mb"
        }
      }
    }
  }
}

Index settings:

{
  "test" : {
    "settings" : {
      "index" : {
        "creation_date" : "1443254140204",
        "uuid" : "eZUuRG15Rti68Pvt7pZ5-w",
        "store" : {
          "throttle" : {
            "type" : "none",
            "max_bytes_per_sec" : "100mb"
          }
        },
        "number_of_replicas" : "0",
        "number_of_shards" : "2",
        "refresh_interval" : "-1",
        "version" : {
          "created" : "1070199"
        }
      }
    }
  }
}

Here is the logs of the mongo-connector.

2015-09-26 08:33:28,648 [INFO] elasticsearch:64 - POST http://172.31.1.254:9200/_bulk?timeout=60&refresh=false [status:200 request:0.105s]
2015-09-26 08:33:31,011 [INFO] elasticsearch:64 - POST http://172.31.1.254:9200/_bulk?timeout=60&refresh=false [status:200 request:0.134s]
2015-09-26 08:33:32,265 [INFO] elasticsearch:64 - POST http://172.31.1.254:9200/_bulk?timeout=60&refresh=false [status:200 request:0.129s]

    the above is the 600th bulk request, it takes around 1s : 08:33:32,265 - 08:33:31,011 
    the next is the 601th bulk request, it takes around 10s:  08:33:43,476 - 08:33:32,265

2015-09-26 08:33:43,476 [INFO] elasticsearch:64 - POST http://172.31.1.254:9200/_bulk?timeout=60&refresh=false [status:200 request:0.571s]
2015-09-26 08:33:54,070 [INFO] elasticsearch:64 - POST http://172.31.1.254:9200/_bulk?timeout=60&refresh=false [status:200 request:0.568s]
2015-09-26 08:34:03,602 [INFO] elasticsearch:64 - POST http://172.31.1.254:9200/_bulk?timeout=60&refresh=false [status:200 request:0.569s]
2015-09-26 08:34:13,100 [INFO] elasticsearch:64 - POST http://172.31.1.254:9200/_bulk?timeout=60&refresh=false [status:200 request:0.570s]

Thank you all, I have figured out the reason. There is no issue with the elastic server and the mongo-connector. The cause is the documents I am using are testing data generated by QA engineer. The size of the first 300k documents are much smaller than the rest of the documents. It takes 10s for a bulk request because the data is really large. So, everything is working well.

Cool! I'm glad you found out what it was!