Im using ES 6.4.3 on Google Compute. Im indexing documents where each doc has 6 fields, 4 ints, 2 floats. Index has 5 shards and replication factor of 1. All fields are indexed (didnt add anything in the mapping, only type for each field) and I disabled _field_names and _all . I disabled swap files on the machine, gave ES 8GB heap with 40% index buffer size. ES is running inside a docker and the machine has 15GB ram and 500GB SSD. refresh_interval is 15m . CPU fluctuates between 10-30%, disk fluctuates between 3-12MB/s for wrties, read is way below, network is at about 2MB/s. Im using a similar code:
from elasticsearch import helpers, Elasticsearch
import csv
es = Elasticsearch(host: ['remote'], port: '9200')
with open('/tmp/x.csv') as f:
reader = csv.DictReader(f)
for resp in helpers.parallel_bulk(es, reader, index='my-index', doc_type='_doc', chunk_size=10000, thread_count=4, queue_size=20):
pass
So everything is pretty "calm" on the machine and STILL I am only able to index 13k~/s, why is that????
I didnt specify memory since its a bit hard to monitor since ES allocates the whole heap right off the bat but I dont think its the memory since I also tried setting refresh_interval to -1 , in which case ES writes to disk about every 10 million documents, still with the same rate (13k~/s).