I have tested my ES instance with 1-3 nodes, 3-5 shards, 1-2 replicas, performance is still the same. I am sending messages with 15 threads using single index or bulk with python script using http://elasticsearch-py.readthedocs.io/en/master/api.html . Times are the same for every setting. What am I doing wrong?
What times?
Single index - ~600 docs/s
Bulk index - ~4000 docs/s
They are about 5kb each.
If you are getting the same performance even if you change the number of nodes, shards and replicas it is possible that something outside of Elasticsearch, e.g. your ingestion pipeline, is limiting performance. It could also be something related to the cluster setup. What is the specification of your nodes? What type of storage are you using?
I use HDD as storage. Specification of my nodes - what do You mean? They are running on seperate machines, 12 CPUs each, 8GB ram. I use default refresh interval time because of lots of search queries. I generate data with python script, using elasticsearch api. Sending is done after generating all the messages so it doesn't affect on performance. Machine has 16 CPUs so it sends data on 16 threads.
Have you been able to identify what is limiting performance? How much CPU is used? What does disk I/O look like? How are you monitoring the performance of the cluster?
I am monitoring it with python script and ElasticHQ, measuring time is started when first 'sending' thread starts his work and ends when all of them join. I used 'top' to check if every CPU is used - yes, they were all working. 'Sar' says that iowait% is always below 2%.
In order to check whether your loading tool is actually limiting performance, try to load similar data from file using Logstash (or other tool) and see if this also hits the same performance limit.
I have noticed that the problem is my generator. I have 16 CPU there, and performance increases only between 1-4 threads. After executing generating&sending script 4 times with ... & ... & ... ElasticHQ shows that performance increased to ~16-17k docs/s so that's a lot better. Is there any tool to load random data to elastic?
Have you tried Logstash?