Elasticsearch IOPS

Hello Team

I am having Elasticsearch version 5.6.3 running on RHEL 7.4. This Cluster is of 10 nodes.

Hardware : 8 CPU and 32 GB RAM.
Hard Disk : HDD { CEPH Storage }

Elastic Data Paths : Multiple { 7 Different mount points }.

We are logging logs data into this ES cluster.

When we are performing IO Test using "fio" tool , we are getting throughput of sequential write to multiple mount points as below
WRITE: bw=20.3MiB/s (21.3MB/s), 20.3MiB/s-20.3MiB/s (21.3MB/s-21.3MB/s), io=24.0GiB (25.8GB), run=1210725-1210725msec

When indexing is being performed , we are getting speed of 30-40 records are documented in indices per 10 seconds. Its very slow.

I would like to understand why is the difference so huge when its comes to realistic data pushing.

Reference : fio job config file :

[global]
bs=4K
iodepth=64
direct=1
ioengine=mmap
group_reporting
numjobs=4
name=es_io_chk
rw=write
nrfiles=4
randrepeat=1
gtod_reduce=1
size=1G

[job1]
directory=/usrdata
filename=test_io

[job2]
directory=/usrdata2
filename=test_io

[job3]
directory=/usrdata3
filename=test_io

[job4]
directory=/usrdata4
filename=test_io

[job5]
directory=/usrdata5
filename=test_io

[job6]
directory=/usrdata6
filename=test_io

Could you please help me in understanding this behaviour.

Thanks

That's been EOL for quite some time, please upgrade as a matter of urgency.

That's why, get rid of it and your issues will likely go away

Hi Mark

Yes, I understand ES 5.6.3 is very old one. We are testing with new ES 7.x version.

Ideally I know HDD are bad performer , at same time I need to prove that its IO is too bad for our system / application.

When IO Testing was done , its results were surprising for me as 20Mibs throughput is seen.
While Indexing is dead slow.

Could you help me with "fio" what parameters should I pass in, to get like Elasticsearch IO behaviour and true throughput can be captured.

Why is Elasticsearch so slow in documenting data 30-40 docs per 10 secs.

We don't support Elasticsearch with distributed filesystems, so it's unlikely you will find anything to fix your issue other than moving away from ceph sorry.

Okay. We have option of specifying multiple data path for ES cluster in each node.

what would be the purpose of this ?

Do you suggest that we should have single mount point { /usrdata -- of 10 TB } on each node of elasticsearch to server documenting ?

The issue is not multiple data paths, the issue is ceph is distributed.
You are then running a high performance distributed system on top of a distributed filesystem. It's not really designed for high performance.

Okay , Understood.

Could you please confirm Elasticsearch performs write operations in SEQUENTIAL manner and using mmap file system to store shard data ...?

Is there fio command options to be used for benchmarking IO performance , equivalent to ES oeprations ?

So Elasticsearch on Distributed file system is not supported , then it would not be supported on NETAPP storage too ?

Elasticsearch does generally not write very large segments sequentially, so your fio load is likely not very representative. I would recommend trying with a random read/write load instead as I would expect that to better match an Elasticsearch load pattern.

I recall having seen issues with GlusterFS and NFS in the past but am not sure whether these issues persists.

Okay, Understood.

Could you please elaborate Large segments means ==> Single Document of huge mapping and complex structure ? OR in parallel pushing multiple documents to different indices ?

In our case , out mapping is not that huge and majorly consist of text. We do push multiple docs to multiple indices in parallel at a time from different applications of ours.

Also from Elasticsearch documentation , I learned that ES writes data in sequential manner. Hence I am doing sequential testing .. using MMAP as ioengine.

Also we have observed , CEPH storage having SSD are 50 times faster then CEPH HDD... both being on CEPH are distributed storages. Hence not able to reach conclusion by would HDD have such a slow IO when fio shows its IOPS = 3000-3500 and having bandwidth of 20Mibs ....

I find that dip benchmarks with large sequential reads and writes are not representative of normal Elasticsearch load. If you believe differently I would recommend setting up a cluster with fast local storage and run a test with representative load and profile io access patterns.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.