I would like to scale my cluster with Elasticsearch 2.4 and therefore I am trying to see how much resources I need in the future.
What I see now.
I am processing like 6MB/s data at 4 machines. But I see this 6MB/s makes those 4 nodes very busy.
What makes me courious is that iostat on linux OS reports really strange results.
I see that complete cluster has like ~900MB/s writing statistics.
So the questions are:
Is it possible that 6MB/s causes effectively 900MB/s load?
What really happens that IO increases so much?
Shall I count 1.5TB/s writes when I will increase data load to 10MB/s?
What does your workload look like? Are you primarily indexing new documents or updating existing ones? What is the size of your documents? Are you using nested documents? What throughput are you seeing? What does the 6MB/s represent?
Documents are in average 1k big but maximal doc is 3MB.
I am mostly indexing new documents. Updates comes sometimes, quite rare about 2% of all messages.
Throughput is about 8k messages with replication of 2, so the real throughput is 4k.
6MB/s is the size of the payload divided by the processing time. I mean this measure I have added to my application to carefuly observe payload size and not any kind of network throughput.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.