Huge Performance impact when indexing large payloads

sreekanth · March 5, 2020, 6:31am

We are doing few performance run to measure the product performance. We were trying ~100 single ingests/sec and ~300 search/sec. the data we are ingesting is of size ~1KB (i mean a record size which we insert). Every thing was good and response times were <200ms.

When we add just 1 single ingest/sec with a payload of size 10MB, all other transaction response times jumped to >1sec.

Basically we are ingesting content of a file.

Any idea how we can reduce this impact? is that really a right design to ingest 10MB file content to elastic search?

dadoonet · March 5, 2020, 7:28am

How many documents do you have in the 10mb payload?

sreekanth · March 5, 2020, 7:39am

It is just 1 document per payload of size 10MB. it has 10 fields to describe the file getting uploaded and one parameter as "content" where i paste 10MB data.

Here is exactly how I ingest it

Request:1

POST /ESURL/indexName/item/File1    
 {      
        	"field1": "data1",
        	"field2": "data2",
        	"field3": "data3",
        	"field4": "data4",
        	"field5": "data5",
        	"field6": "data6",
        	"field7": "data7",
        	"field8": "data8",
        	"field9": "data9",
   }

Request:2: Where I add actual file content as child record. You can see below where I paste that 10MB String content

POST /ESURL/indexName/item/CFile1?routing=File1
{
    "joinField": {
        "parent": "File1",
        "name": "content"
    },
    "FileContent":"<<<10MB Data>>"
    
}

dadoonet · March 5, 2020, 8:10am

So it's not really surprising that anytime you run a search query, it gives you back 10 hits of 10mb each. Which means that each response has to read 100mb and send that over the network. With 300 search requests per second, it means 30,000 mb/s... That could explain IMHO.

sreekanth · March 5, 2020, 9:35am

Sorry I would have given exact scenario details...Here is the exact scenario

100 single ingests/sec on index1 (doc size of ~1KB)
300 searches ingests/sec on index2 (in this each doc is <1KB)

1 single ingest/sec on index3 -- This is 10MB sized doc

So my huge sized doc ingest is in different index which is regressing all other indices search/ingest performance.

My main query was....Is that a right way to ingest 10MB sized doc? Is this a valid design way of using elastic search?

dadoonet · March 5, 2020, 10:08am

That comes with a cost.
As I said, reading and sending 10mb documents over the wire has much more impact than small documents.

Is your document realistic? I mean what kind of use case are you trying to cover with elasticsearch? What type of document would you like to index?

If you want to index a full book of 800 pages, then yes the document will be super big. But may be it's worth considering each page of the book instead the full book?

sreekanth · March 5, 2020, 10:49am

Great thanks for details...
Basically people uploads documents and we ingest it's contents also. Customers may upload any sized file, but we restricted to upload max 10MB (first few pages)...Now it looks like we need to reduce that also to 1MB. 10MB looks like too much costly and affecting all regular simple operations also.

dadoonet · March 5, 2020, 11:00am

There's a difference between sending binary documents and storing the binary or only the extracted content. FSCrawler by default only send the extracted content and the metadata for example.

system · April 2, 2020, 11:00am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch Query performance while continuous ingestion Elasticsearch	5	577	August 15, 2019
Elasticsearch ingest performance Elasticsearch	9	5523	July 6, 2017
Ingest performance degrades sharply along with the documents having more fileds Elasticsearch	25	1184	July 6, 2017
Recommendation for Elastic Search sizing for 45,000 Events per second Elasticsearch	6	936	June 3, 2019
Optimizing configuration for ingestion Elasticsearch	1	433	December 19, 2017

Huge Performance impact when indexing large payloads

Related topics