Huge Performance impact when indexing large payloads

We are doing few performance run to measure the product performance. We were trying ~100 single ingests/sec and ~300 search/sec. the data we are ingesting is of size ~1KB (i mean a record size which we insert). Every thing was good and response times were <200ms.

When we add just 1 single ingest/sec with a payload of size 10MB, all other transaction response times jumped to >1sec.

Basically we are ingesting content of a file.

Any idea how we can reduce this impact? is that really a right design to ingest 10MB file content to elastic search?

How many documents do you have in the 10mb payload?

It is just 1 document per payload of size 10MB. it has 10 fields to describe the file getting uploaded and one parameter as "content" where i paste 10MB data.

Here is exactly how I ingest it

Request:1

POST /ESURL/indexName/item/File1    
 {      
        	"field1": "data1",
        	"field2": "data2",
        	"field3": "data3",
        	"field4": "data4",
        	"field5": "data5",
        	"field6": "data6",
        	"field7": "data7",
        	"field8": "data8",
        	"field9": "data9",
   }

Request:2: Where I add actual file content as child record. You can see below where I paste that 10MB String content

POST /ESURL/indexName/item/CFile1?routing=File1
{
    "joinField": {
        "parent": "File1",
        "name": "content"
    },
    "FileContent":"<<<10MB Data>>"
    
}

So it's not really surprising that anytime you run a search query, it gives you back 10 hits of 10mb each. Which means that each response has to read 100mb and send that over the network. With 300 search requests per second, it means 30,000 mb/s... That could explain IMHO.

1 Like

Sorry I would have given exact scenario details...Here is the exact scenario

100 single ingests/sec on index1 (doc size of ~1KB)
300 searches ingests/sec on index2 (in this each doc is <1KB)

1 single ingest/sec on index3 -- This is 10MB sized doc

So my huge sized doc ingest is in different index which is regressing all other indices search/ingest performance.

My main query was....Is that a right way to ingest 10MB sized doc? Is this a valid design way of using elastic search?

That comes with a cost.
As I said, reading and sending 10mb documents over the wire has much more impact than small documents.

Is your document realistic? I mean what kind of use case are you trying to cover with elasticsearch? What type of document would you like to index?

If you want to index a full book of 800 pages, then yes the document will be super big. But may be it's worth considering each page of the book instead the full book?

Great thanks for details...
Basically people uploads documents and we ingest it's contents also. Customers may upload any sized file, but we restricted to upload max 10MB (first few pages)...Now it looks like we need to reduce that also to 1MB. 10MB looks like too much costly and affecting all regular simple operations also.

There's a difference between sending binary documents and storing the binary or only the extracted content. FSCrawler by default only send the extracted content and the metadata for example.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.