I am trying to index json events with 16 string fields each and maximum EPS that i am able to index is 5k. These json events are fed to elasticsearch by traversing list of strings which are loaded from a file . Elastic Search cluster consists of 3 nodes with 32 GB RAM and 8 core CPU.
We are performing bulk indexing with following settings
Thanks for reply.
Indexing 1 million events with single thread, EPS (events per second) is around 7k, but when tried using 4 threads EPS is around 1.7k.
Memory Consumption is 28% (nearly 9GB) CPU is idle 96%, Maximum CPU usage is 10%.
Out of 1 million events, indexing speed is around 40k EPS till 700k events, but overall processing rate is around 7k EPS.
Below is I/O taken from one machine Before:
avg-cpu: %user %nice %system %iowait %steal %idle
0.54 0.00 0.13 1.70 0.00 97.63
So your system looks essentially idle. There must be an issue with the benchmark somehow. Maybe the bottleneck is to read data from your file?
Also for the record, setting time expiry on the fielddata and filter caches is almost always wrong as fielddata entries are very expensive to regenerate and filter entries can't go outdated as they are cached per segment.
The file is loaded only once and maintained in-memory and stats are collected after loading the file. So, i don't think reading data from file is bottleneck.
Even though disabled the expiry settings on field data and filter cache, there isn't any change in indexing rate.
Then I'm very confused why you can't max out either I/O or CPU on your server. There must be something wrong somewhere... Are you using bulk or separate indexing requests to load data? Do you have good network connectivity between your server and the machine that runs the benchmark?
Right I didn't expect it to change indexing rate, this was more a side note.
I created one transport client object for 3 node cluster and send JSON events from java util list loaded from file to cluster using bulk index API.
Following are parameters used for bulk indexing API :
bulkprocessor.BulkActions=10000
bulkprocessor.BulkSize=10
bulkprocessor.FlushInterval=20
bulkprocessor.ConcurrentRequests=10
I tried multiple ways but unable to increase CPU or Memory utilization and also indexing rate.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.