I try to use Python Elasticsearch Client for inserting data to Elasticsearch however performance is terrible. I use Ubuntu 14 with 164 GB RAM and 40 processors Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz. I would like to achieve efficiency equals or close to 100000 records per second however at the moment after running this source code:
from elasticsearch import Elasticsearch
es = Elasticsearch([{'host': 'localhost', 'port': 9200}])
es = Elasticsearch(timeout=30, max_retries=10, retry_on_timeout=True)
print("time 0: " + str(datetime.now()))
for key in range(1000):
es.index(index='messages', doc_type='message', body={
'message': "example message",
})
print("time 1: " + str(datetime.now()))
I get result:
time 0: 2018-06-20 10:38:08.311971
time 1: 2018-06-20 10:38:36.154774
I tried also version with bulk in this way:
from elasticsearch import Elasticsearch
from elasticsearch import helpers
es = Elasticsearch([{'host': 'localhost', 'port': 9200}])
es = Elasticsearch(timeout=30, max_retries=10, retry_on_timeout=True)
actions = []
for key in range(1000):
actions.append(
{
"_index": "messages",
"_type": "message",
"_source": {
"message": "example message"}
}
)
print("time 0: " + str(datetime.now()))
helpers.bulk(es, actions)
print("time 1: " + str(datetime.now()))
but the result is still very bad:
time 0: 2018-06-20 10:51:25.667748
time 1: 2018-06-20 10:51:29.757604
Any ideas how can I improve this?