Document size used for benchmarking

Hello All,
I looked at Geonames benchmark results
and Hardware details. Results are very impressive for indexing. On single node machine, they are able to index around 40000 documents per second. They have not mentioned the document size though. Does anybody know what document size they are indexing to get 40000 docs per second performance ?
I have similar machine but without SSDs and I am getting indexing throughput only 7000 documents per second. My average document size is 1.6KB.

Thanks in advance.

Have you looked at the link to the track description? It gives a link to the documents that are used for that benchmark: Documents are about 280 bytes each once formatted to json, so quite small.

Thanks for quick reply.
I could not reach to that URL earlier.

On single node, they must have turned off replica. Is this true?

So they are putting 40k docs * 280 bytes so putting 11.2 MB data per second.
I am putting 6666 docs * 1500 bytes so putting 9.9 MB data per second. Does that mean I am close to their benchmarks? I have turned off replica.


Yes, you are close in terms of throughput, but that is not enough to know whether you are indexing as fast as possible or not. A good indicator is whether either CPU or I/O is maxed out on your nodes.


CPU was getting used to maximum. Sometimes I saw 'top' command was showing 400%