Slow bulk indexing performance

Im using ES 6.4.3 on Google Compute. Im indexing documents where each doc has 6 fields, 4 ints, 2 floats. Index has 5 shards and replication factor of 1. All fields are indexed (didnt add anything in the mapping, only type for each field) and I disabled _field_names and _all . I disabled swap files on the machine, gave ES 8GB heap with 40% index buffer size. ES is running inside a docker and the machine has 15GB ram and 500GB SSD. refresh_interval is 15m . CPU fluctuates between 10-30%, disk fluctuates between 3-12MB/s for wrties, read is way below, network is at about 2MB/s. Im using a similar code:

from elasticsearch import helpers, Elasticsearch
import csv

es = Elasticsearch(host: ['remote'], port: '9200')

with open('/tmp/x.csv') as f:
    reader = csv.DictReader(f)
    for resp in helpers.parallel_bulk(es, reader, index='my-index', doc_type='_doc', chunk_size=10000, thread_count=4, queue_size=20):
        pass

So everything is pretty "calm" on the machine and STILL I am only able to index 13k~/s, why is that????

I didnt specify memory since its a bit hard to monitor since ES allocates the whole heap right off the bat but I dont think its the memory since I also tried setting refresh_interval to -1 , in which case ES writes to disk about every 10 million documents, still with the same rate (13k~/s).

How much CPU does the loading script use? Have you tried splitting the input into multiple files and running more than one loading script in parallel?

I didnt check the script itself but it probably takes nothing (assume for now thats its not the problem, I'll update if it is).
I didnt try running multiple instance of the script since im using parallel bulk so its already running a number of instances.
Plus I tried running bulk and not parallel_bulk, same rate.
Also, Its not so clear from my first message but im running ES on only one machine, so my "cluster" consists on 1 machine.

As there is nothing that jumps out as limiting performance in Elasticsearch, a sensible first step is to eliminate the loader as the bottleneck. Please look at CPU usage and try run multiple processes in parallel to see if that makes a difference.

Turns out it was the loading script which I didnt expect since the ES guys wrote it.
Also running a number of instances helps since ever after improving the script using streaming bulk and stuff its still a bit of a bottleneck

Python does not do multithreading well, which is why I suspected this may be the case. Our benchmarking tool Rally is implemented in Python, but generates a number of processes to get around this limitation.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.