Indexing Performance vs Document Size

anjan · July 30, 2016, 10:16pm

I see a big difference in indexing throughput between small documents and large documents. Is this expected and why under the below test conditions?

Small documents are 1KB, large documents are 10KB to 30KB
Observed throughput is 3MB/s vs 20MB/s (not able to beat 4000 documents/sec)
Bulk size is 300 documents; There is no improvement in performance beyond this
Refresh is disabled (-1)
Not analyzing any field
Index buffer and translog are sized appropriately
Disk storage, no replication, 1 shard
No big difference with auto-id

BTW, refresh still happens when index buffer is half full (index buffer must have a ping pong implementation).

I would like to understand what is the per document processing overhead (including per field) and where are the bottlenecks.

jprante · July 30, 2016, 10:37pm

What ES version? What machine? What operating system? What network interface capacity? What Java VM version? How many clients are indexing?

Each field results in a Lucene index where you can search on. Also, enabled _source and _all contribute to the indexing. If you do not carefully specify your field mapping, you put more load on the document indexing than possibly required.

anjan · August 1, 2016, 5:49pm

ES: 2.3.2, Lucene 5.5.0
OS: Ubuntu 14.04.1
Java: 8u73
java version "1.8.0_73"
Java(TM) SE Runtime Environment (build 1.8.0_73-b02)
Java HotSpot(TM) 64-Bit Server VM (build 25.73-b02, mixed mode)
_all is disabled
_source disabled
Fields are explicitly mapped
Same configs and hardware with only change being document size 1K vs 30K
CPU is close to 100%
The difference in throughput with size of documents is surprising (3MB/s vs 20MB/s)

jprante · August 1, 2016, 7:07pm

Maybe the client is challenged? How do you generate the bulk input? What client language/tool? Is the client running on a separate machine?

Topic		Replies	Views
How does batch size effect performance in bulk indexing? Elasticsearch	4	4294	July 5, 2017
Bad performance with varying bulk size Elasticsearch	8	1577	July 5, 2017
Elasticsearch indexing performance Elasticsearch	4	326	September 26, 2019
Horizontal scaling of indexing Elasticsearch	8	1996	July 5, 2017
Indexing not proportional to document size Elasticsearch	5	802	July 5, 2017

Indexing Performance vs Document Size

Related topics