We are doing few performance run to measure the product performance. We were trying ~100 single ingests/sec and ~300 search/sec. the data we are ingesting is of size ~1KB (i mean a record size which we insert). Every thing was good and response times were <200ms.
When we add just 1 single ingest/sec with a payload of size 10MB, all other transaction response times jumped to >1sec.
Basically we are ingesting content of a file.
Any idea how we can reduce this impact? is that really a right design to ingest 10MB file content to elastic search?
It is just 1 document per payload of size 10MB. it has 10 fields to describe the file getting uploaded and one parameter as "content" where i paste 10MB data.
So it's not really surprising that anytime you run a search query, it gives you back 10 hits of 10mb each. Which means that each response has to read 100mb and send that over the network. With 300 search requests per second, it means 30,000 mb/s... That could explain IMHO.
That comes with a cost.
As I said, reading and sending 10mb documents over the wire has much more impact than small documents.
Is your document realistic? I mean what kind of use case are you trying to cover with elasticsearch? What type of document would you like to index?
If you want to index a full book of 800 pages, then yes the document will be super big. But may be it's worth considering each page of the book instead the full book?
Great thanks for details...
Basically people uploads documents and we ingest it's contents also. Customers may upload any sized file, but we restricted to upload max 10MB (first few pages)...Now it looks like we need to reduce that also to 1MB. 10MB looks like too much costly and affecting all regular simple operations also.
There's a difference between sending binary documents and storing the binary or only the extracted content. FSCrawler by default only send the extracted content and the metadata for example.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.