Hello All,
I looked at Geonames benchmark results
and Hardware details. Results are very impressive for indexing. On single node machine, they are able to index around 40000 documents per second. They have not mentioned the document size though. Does anybody know what document size they are indexing to get 40000 docs per second performance ?
I have similar machine but without SSDs and I am getting indexing throughput only 7000 documents per second. My average document size is 1.6KB.
Thanks for quick reply.
I could not reach to that URL earlier.
On single node, they must have turned off replica. Is this true?
So they are putting 40k docs * 280 bytes so putting 11.2 MB data per second.
I am putting 6666 docs * 1500 bytes so putting 9.9 MB data per second. Does that mean I am close to their benchmarks? I have turned off replica.
Yes, you are close in terms of throughput, but that is not enough to know whether you are indexing as fast as possible or not. A good indicator is whether either CPU or I/O is maxed out on your nodes.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.